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ABSTRACT 



The present invention provides obfuscation techniques for 
enhancing software security. In one embodiment, a method 
for obfuscation techniques for enhancing software security 
includes selecting a subset of code (e.g., compiled source 
code of an application) to obfuscate, and obfuscating the 
selected subset of the code. The obfuscating includes apply- 
ing an obfuscating transformation to the selected subset of 
the code. The transformed code can be weakly equivalent to 
the untransformed code. The applied transformation can be 
selected based on a desired level of security (e.g., resistance 
to reverse engineering). The applied transformation can 
include a control transformation that can be creating using 
opaque constructs, which can be constructed using aliasing 
and concurrency techniques. Accordingly, the code can be 
obfuscated for enhanced software security based on a 
desired level of obfuscation (e.g., based on a desired 
potency, resilience, and cost). 
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METRIC METRIC NAME CITATION 

u,.| Program Length Halstead 



^7 



£(P) increases with the number of operators and operands in P. 



Cyclomatic Complexity McCabe 



£(F) increases with the number of predicates in F. 



u, 3 Nesting Complexity Harrison 

£(F) increases with the nesting level of conditionals in F. 

Data Flow Complexity Oviedo 



m 

£(F) increases with the number of inter-basic block variable references in F. 

u, 5 Fan-in/out Complexity Henry 

E(F) increases with the number of formal parameters to F, and with the number 

of global data structures read or updated by F. 

ix 6 Data Structure Complexity Munson 

£(P) increases with the complexity of the static dala structures declared in P. 
The complexity of a scalar variable is constant. The complexity of an array 
increases with the number of dimensions and with the complexity of the 
element type. The complexity of a record increases with the number and 
complexity of its fields. 



00 Metric Chidamber 

£(Q increases with the number of methods in C i(n ^) the depth 
(distance from the root) of C in the inheritance tree, (n the number of direct 
subclasses of C-j (u.d) the number of other classes to which Cis coupled*, 
(u.9) the number of methods that can be executed in response to a message 
sent to an object of C 1 (u. h the degree to which C 's methods do not 
reference the same set of instance variables. Note: u. j measures cohesion; 
i.e., how strongly related the elements of a module are. 

*Two classes are coupled if one uses the methods or instance variables of the other. 
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FIG. 9 



{ intv,a=5;b=6; { int v, a=5; b=6; 

v ^=a + b; ^ \ (b is unchanged) 

if(b>6) T ... if (b < 7) T a++1 

if (random (1 ,5) < 0) F... v =1 1_ ( a > 5) 7 V =b=b; v=b 

} } 

FIG. 10a FIG. 10b 
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i=1 i+1j J=100; 

while (1 <1 00) { T ^ whi | e (( k1 00) // (j* j * G+1 ) * G + 1 )%A = 0)' ) { 

' j=j*1+3; 

(c) } (d) 



FIG. 12 
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class C{ class C'{ 

method M1 (T1 a) { method M (Ti a; T2 cjint V) { 

sy i ;...sj»; if(v=p){S l T 1 ;...Sk ;} 

} 1 else {S^.S*;} 

method M2 (T1 b;T2c){ — ^ } 

Sk1;...Sk2; } 
1 m 

} 

j { C'x=newC; 

x.M(a, c, V =p ); 

{ Cx=newC; x.M(b, c, V = 9); } 



x.M1(a); x.M2(b,c);} 



FIG. 18 



class C { 
method m (int x) 
{S, ...S k } 

{ Cx = newC; 
x.m(8); ... x.m(7); 

} 



class C1 { 
method m (int x) 
(S* ;...8J } 
method ml (intx) 

{S? ;...s c n } 

} 

class 02 inherits C1 { 
method M (int x) 
{Sb ...Sj| } 

} 

{ C1x; 

if (P7) x=new C1 else x=new C2; 
x.m(5); ...;x.m1(7); 



FIG. 19 
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FIG. 20a 



FIG. 20b 



FIG. 20c 



tor (i=1 ,i<=n,i++) 
for(j=1,j<=n,j++) 
a[1,j]=b0,il 



for(i=2,i<(n-1),i++) 
a[i]+=a[1-i]==[i+1] 



for(i=1 ,i<n,i++) { 
a[i] += c; 

x[i+1]=d+x[i+1]=a[i] 

} 



for(I=1, I<=n, I+=64) 
for(J=1,J<=n,J+=64) 
for (i=I 1 i<=min(I+63,n),i++) 
for (j_J,j<=min(J+65,n),j++) 
a[ij]=bD,i] 

for(i=2,i<(n-2),i+=2){ 
a[i]+= n[i-1]=a[i+1]; 
a[i+1] += a[i]=a[i+2]; 

!f(((n-2)%2)==1) 
x[n-1] += a[n-2]=a[n] 



for (i=1 ,i<n,i++) 

a[i]+=c; 
for (i=1 , kn, i++) 

x[i+i] <d+xfj+1]=a[i] 



FIG. 21a 



FIG. 21b 



FIG. 21c FIG. 21 d 
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n 

0 1 2 3 0R[A,B) 


0 12 3 


0 0 

0 1 

1 0 

1 1 


False 
True 
True 
False 


0 9 o 

1 1 

2 


0 


1 0 


3 


0 


0 


0 0 


3 


1 


2 


3 


1 


0 B 1 


3 


1 


2 


3 B 1 


1 


1 


2 


2 


2 


0 


2 


1 


3 2 


2 


2 


1 


1 


3 3 


3 


0 


0 


3 3 


0 


1 


2 


0 



(1) boolA.B.C; 

(2) A = True; 

(3) B = False 

(4) C = False 

(5) C = A&B 

(6) C = A&B 

(7) C = AIB; 

(8) if (A) 

(9) if(B)...; 

(10) if(C)...; 



(1') shorta1,a2,b1,b2 l c1,c2; 

(2') a1=0;a2=1; 

(3') b1=0;b2=0; 

(4') d=1;c2=1; 

(5') x=AND[2*a1+a2,2.b1+b2]; d=x/2; c2 

(6') d=(a1 A a2)&(b1 A b2);c2=0 

(7') x=OR[2.a1+a2,2*b1+b2]; d=x/2; c2=; 

(8') x=2.a1+a2; if ((x==1) II (x==2) 

(9') rf (b1 ^ b2) 
(10')if(VAL[c1,c2]) 



=x%2; 



FIG. 21 e 
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String G (int n) { 
int i=0,k; 
String B; 
while (i) { 

L1: if(n==1){S[i++]="A";k=0;gotoL6}; 
L2: if (n==2) {S[i++]= u B ,, ;k= -2 ;goto L6}; 
L3: if (n==3) {S(i++]="C";goto L8); 
L4: if(n==4){S[i++]="K B ;gotoL9}; 
L5: if(n==5){S[i++]-C";gotoL11}; 

if (n>12) goto L1; 
L6: if (k++<=2) {S[i++]='A D ;goto L6} else goto L8; 
L8: returnS; 
L9: S[i++]="C; goto L10; 
L10: S[i++]="B"; goto L8: 
L11: S[i++]="C"; goto L12; 
L12: goto L10; 

} 

} 

FIG. 22 



FIG. 23a 

Z(X + r,Y) = 2 32 Y + (r + X) =Z(X,Y) + r 
Z(X,Y + r) = 2 32 • (Y + r) + X = Z(X,Y) + r • 2 32 
Z(Xr,Y) =2 32 Y + X + r = Z(X,Y) + (r- 1) • X 
Z(X,Y r) = 2 32 Yr + X =Z(X,Y) + (r- 1) • 2 32 . Y 



FIG. 23b 

(1) int X=45, Y=95; (V) long Z=1 677590661 19551 045; 

(2) X+=5; x (2')Z+=5; 

(3) Y += 1 1 ; — — N (3') Z += 47244640256; 

(4) X * = c; ^ (4') z += (c-1). (Z & 4294967295); 

(5) Y * = d; (5') Z += (d-1). (Z & 18446744069414584320); 
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(1) int A[9]; 

(2) A[i] = ...; 



(3) intB[8],C[19J; 

(4) B[i] = ...; 

(5) C[i] = ...; 

(6) in* D[9] =^> 

(7) for(i=o;i<=B;i++) 

D[i]=2*D[i+1]; 



(8) int E[2,2]; 

(9) for(i=Q;i<=2;i ++) 

for(j=0;i<=2;i++) 
swap(E(i,j], E0,i]); 



(1') intA1[4],A2[4]; 
(2') if((i%2)==0)A1[i/2]=... 
else A2[i/2]=.„; 

(3') int BC[20]; 
(4') BC[3*i] = ...; 
(5') BC[i/2.3+1+i%2] = ...; 
(6') int D1 [1,4]; 
(7') for(j=0;j<=1;j++) 
for(k=0;k<=4;k++) 
if (k==4) 
D1[j,k]=2*D1D+1.0]; 

6lS6 

D1[j.k]=2*D1[j,k+1]; 

(8') int E1[8] 

(9') for(i=0;1<=8;i++) 

swap(E[i], E[3=(i%3)+i/3]); 



FIG. 24a 
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Node g, h; 

method P(...,Nodef){ 
/* 1 ♦/ g = g.Move(); 

h = h.MoveO; 
1*2 ♦/ h = h.lnsertjnew Node); 

/*3 •/ x.R(..,f.Move()); 

/*4 ./ if(f==g) ? ... 

1.5*1 if(g==h)F... 

/. 6 ./ f.Token=False; 

g. Token=True; 

1*1 *l if (f.Token)? ... 

/. 8 */ f.Token=True; 

h. Token=False; 



} 



/.9 ./ if (f.Token) 1 ... 



FIG. 26 



program M; 
"if (P T ) ... 
end M. 



Output 



Input 



program M'; 
"if (True) ... 
end M'. 



<([dentical?)> 



Output' 



FIG. 30 
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thread S { 
int R; 

while (1) { 
R = random(1,C); 
X = R*R; 
sleep(3); 

} 

} 



thread T { 
int R; 
while (i) { 

R = random(1,C); 

X = 7*R*R; 

sleep(2); 

X*=X; 

sleep(5); 

} 

} 



intX, Y; 

const C = sqrt(maxint)/10; 
main 0 { 
S.run(); T.run(); 

if((Y-1) = X)^=[p] 



FIG. 27 
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(b) 8 1 : (c) 
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else 
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else 
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OBFUSCATION TECHNIQUES FOR 
ENHANCING SOFTWARE SECURITY 

FIELD OF THE INVENTION 

The present invention relates to methods and apparatus 
for preventing, or at least hampering, interpretation, 
decoding, or reverse engineering of software. More 
particularly, although not exclusively, the present invention 
relates to methods and apparatus for increasing the structural 
and logical complexity of software by inserting, removing, 
or rearranging identifiable structure or information from the 
software in such a way as to exacerbate the difficulty of the 
process of decompilation or reverse engineering. 

BACKGROUND 

The nature of software renders it susceptible to analysis 
and copying by third parties. There have been considerable 
efforts to enhance software security, which have met with 
mixed success. Such security concerns relate to the need to 
prevent unauthorized copying of software and a desire to 
conceal programming techniques in which such techniques 
can be determined via reverse engineering. 

Established legal avenues, such as copyright, provide a 
measure of legislative protection. However, enforcing legal 
rights created under such regimes can be expensive and time 
consuming. Further, the protection afforded to software 
under copyright does not cover programming techniques. 
Such techniques (i.e., the function as opposed to the form of 
the software) are legally difficult to protect. A reverse 
engineer could escape infringement by rewriting the relevant 
software, ab initio, based on a detailed knowledge of the 
function of the software in question. Such knowledge can be 
derived from analyzing the data structures, abstractions, and 
organization of the code. 

Software patents provide more comprehensive protection. 
However, it is clearly an advantage to couple legal protec- 
tion of software with technical protection. 

Previous approaches to the protection of proprietary soft- 
ware have either used encryption-based hardware solutions 
or have been based on simple rearrangements of the source 
code structure. Hardware-based techniques are non-ideal in 
that they are generally expensive and are tied to a specific 
platform or hardware add-on. Software solutions typically 
include trivial code obfuscators, such as the Crema obfus- 
cator for Java™. Some obfuscators target the lexical struc- 
ture of the application and typically remove source code 
formatting and comments and rename variables. However, 
such an obfuscation technique does not provide sufficient 
protection against malicious reverse engineering: reverse 
engineering is a problem regardless of the form in which the 
software is distributed. Further, the problem is exacerbated 
when the software is distributed in hardware -independent 
formats that retain much or all of the information in the 
original source code. Examples of such formats are Java™ 
bytecode and the Architecture Neutral Distribution Format 
(ANDF). 

Software development can represent a significant invest- 
ment in time, effort, and skill by a programmer. In the 
commercial context, the ability to prevent a competitor from 
copying proprietary techniques can be critical. 

SUMMARY 

The present invention provides methods and apparatus for 
obfuscation techniques for software security, such as com- 
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puter implemented methods for reducing the susceptibility 
of software to reverse engineering (or to provide the public 
with a useful choice). In one embodiment, a computer 
implemented method for obfuscating code, includes testing 
5 for completion of supplying one or more obfuscation trans- 
formations to the code, selecting a subset of the code to 
obfuscate, selecting an obfuscating transform to apply, 
applying the transformation, and returning to the completion 
testing step. 

30 In an alternative embodiment, the present invention 
relates to a method of controlling a computer so that 
software running on, stored on, or manipulated by the 
computer exhibits a predetermined and controlled degree of 
resistance to reverse engineering, including applying 

15 selected obfuscating transformations to selected parts of the 
software, in which a level of obfuscation is achieved using 
a selected obfuscation transformation so as to provide a 
required degree of resistance to reverse engineering, effec- 
tiveness in operation of the software and size of transformed 

2 0 software, and updating the software to reflect the obfuscat- 
ing transformations. 

In a preferred embodiment, the present invention provides 
a computer implemented method for enhancing software 
security, including identifying one or more source code input 

25 files corresponding to the source software for the application 
to be processed, selecting a required level of obniscation 
(e.g., potency), selecting a maximum execution time or 
space penalty (e.g., cost), reading and parsing the input files, 
optionally along with any library or supplemental files read 

30 directly or indirectly by the source code, providing infor- 
mation identifying data types, data structures, and control 
structures used by the application to be processed, and 
constructing appropriate tables to store this information, 
preprocessing information about the application, in response 

35 to the preprocessing step, selecting and applying obfuscating 
code transformations to source code objects, repeating the 
obfuscating code transformation step until the required 
potency has been achieved or the maximum cost has been 
exceeded, and outputting the transformed software. 

40 Preferably, the information about the application is 
obtained using various static analysis techniques and 
dynamic analysis techniques. The static analysis techniques 
include inter-procedural dataflow analysis and data depen- 
dence analysis. The dynamic analysis techniques include 

45 profiling, and optionally, information can be obtained via a 
user. Profiling can be used to determine the level of 
obfuscation, which can be applied to a particular source code 
object. Transform ations can include control transformations 
created using opaque constructs in which an opaque con- 

50 struct is any mathematical object that is inexpensive to 
execute from a performance standpoint, simple for an obfus- 
cator to construct, and expensive for a deobfuscator to break. 
Preferably, opaque constructs can be constructed using alias- 
ing and concurrency techniques. Information about the 

55 source application can also be obtained using pragmatic 
analysis, which determines the nature of language constructs 
and programming idioms the application contains. 

The potency of an obfuscation transformation can be 
evaluated using software complexity metrics. Obfuscation 

eo code transformations can be applied to any language con- 
structs: for example, modules, classes, or subroutines can be 
split or merged; new control and data structures can be 
created; and original control and data structures can be 
modified. Preferably, the new constructs added to the trans- 

65 formed application are selected to be as similar as possible 
to those in the source application, based on the pragmatic 
information gathered during preprocessing. The method can 
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produce subsidiary files including information about which FIGS. 20a through 20c provide examples of loop trans- 
obfuscating transformations have been applied and informa- formations including (a) loop blocking, (b) loop unrolling, 
tion relating obfuscated code of the transformed application and (c) loop fission; 

to the source software. FIG. 21 shows a variable splitting example; 

Preferably, the obfuscation transformations are selected to 5 nG n provides a f unc tion constructed to obfuscate 

preserve the observable behavior of the software such that if strings "AAA", "BAAAA", and "CCB"; 

P is the untransformed software, and P* is the transformed urr ~* ^u' nn ™ m „„; n „ t ' n uu „ 

_ , ri , . , . HG. Z5 snows an example merging two 32-bit variables 

software, P and P have the same observable behavior. More , . , , A • ui n 

- , \ • • • . x and y into one 64-bit variable Z; 
particularly, if P rails to terminate or terminates with an error 

condition, then P' may or may not terminate, otherwise F io FIG - 24 illustrates an example of a data transformation for 

terminates and produce the same output as P. Observable arra y restructunn g; 

behavior includes effects experienced by a user, but P and P' FIG. 25 illustrates modifications of an inheritance hier- 

may run with different detailed behavior unobservable by a archy; 

user. For example, detailed behavior of P and P' that can be FIG. 26 illustrates opaque predicates constructed from 

different includes file creation, memory usage, and network is objects and aliases; 

communication. FIG. 27 provides an example of opaque constructs using 

In one embodiment, the present invention also provides a threads; 

deobfuscating tool adopted to remove obfuscations from an FIGS 2 Sa through 2Sd illustrate obfuscation vs. deob- 

obfuscated application by use of slicing, partial evaluation, fuscation in which (a) shows an original program including 

dataflow analysis, or statistical analysis. three statements, S^, being obfuscated, (b) shows a deob- 

BRIEF DESCRIPTION OF THE DRAWINGS fuscator identifying "constant" opaque predicates, (c) shows 

the deobfuscator determining the common code in the 

The present invention will now be described by way of statements, and (d) shows the deobfuscator applying some 

example only and with reference to the drawings in which: ^ fi na l simplifications and returning the program to its original 

FIG. 1 illustrates a data processing system in accordance form; 

with the teachings of the present invention; piQ. 29 shows an architecture of a Java™ deobfuscation 

FIG. 2 illustrates a classification of software protection tool; 

including categories of obfuscating transformations; FIG. 30 shows an example of statistical analysis used for 

FIGS. 3a and 36 show techniques for providing software 30 evaluation; 

security by (a) server-side execution and (b) partial server- F[GS 3 ' lfl and 3^ provide taMes of M overview of 

side execution; various obfuscating transforms; and 

FIGS. 4a and 46 show techniques for providing software Dir a „ ™„,«,:-«, rt f „„„„ _ 

. , , v . , /. v . . » rlu. 52 provides an overview 01 various opaque con- 
security by (a) using encryption and (b) using signed native structs 
code; 35 

FIG. 5 shows a technique for providing software security DETAILED DESCRIPTION 

through obfuscation, Th e following description will be provided in the context 

FIG. 6 illustrates the architecture of an example of an of a Java obfuscalion tool which fc curren tly being devel- 

obfuscating tool suitable for use with Java™ applications; ^ oped by tfae applicanls . However, it will be apparent to one 

FIG. 7 is a table that tabulates a selection of known 0 f ordinary skill in the art that the present techniques are 

software complexity metrics; applicable to other programming languages and the inven- 

FIGS. 8a and 86 illustrate the resilience of an obfuscating tion is not to be construed as restricted to Java™ applica- 

transformation; tions. The implementation of the present invention in the 

FIG. 9 shows different types of opaque predicates; 45 context of other programming languages is considered to be 

FIGS. 10a and 106 provide examples of (a) trivial opaque within the Purview of one of ordinary skill in the art. The 

constructs and (b) weak opaque constructs; exemplary embodiment that follows is, for clarity, specifi- 

FIG. 11 illustrates an example of a computation transfer- call V Ur S eted at a Java ™ Seating tool, 

mation (branch insertion transformation); In the descnption below, the following nomenclature will 

FIGS. 12a through X2d illustrate a loop condition inser- 50 be used. P is the input application to be obfuscated; P is the 

tion transformation' transformed application; T is a transformation such that T 

.., . . ' . r c j transforms P into P\ P(T)P' is an obfuscating transformation 

FIG. 13 illustrates a transformation that transforms reduc- . CTI , n . , v 7 , , , . ~, 

, n 1 . , ... „ . if P and P have the same observable behavior. Observable 

lble nowgraphs into non-reducible nowgraphs; L L • * j c j n L L • j... 

_ _ 0 r 0 r behavior is denned generally as behavior experienced by the 

FIG. 14 shows that a section of code can be parallelized $$ usef Thus> p , may hayc side cffects such u crealing fiks that 

if it contains no data dependencies; p does notf so long as mese side effects are not expe rienced 

FIG. 15 shows that a section of code that contains no data by mc uscr p aDd p. do not nccd to bc cqua u y efficient, 
dependencies can be split into concurrent threads by insert- 
ing appropriate synchronization primitives; Exemplary Hardware 

HG. 16 shows how procedures P and Q are inlined at their 60 FIG a miistrates a data processing system in accordance 

call-sites and then removed from the code; with me teacD i Qgs G f the present invention. FIG. 1 shows a 

FIG. 17 illustrates inlining method calls; computer 100, which includes three major elements. Com- 

HG. 18 shows a technique for interleaving two methods puter 100 includes an input/output (I/O) circuit 120, which 

declared in the same class; is used to communicate information in appropriately struc- 

FIG. 19 shows a technique for creating several different 65 tured form to and from other portions of computer 100. 

versions of a method by applying different sets of obfuscat- Computer 100 includes a control processing unit (CPU) 130 

ing transformations to the original code; in communication with I/O circuit 120 and a memory 140 
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(e.g., volatile and non-volatile memory). These elements are 
those typically found in most general purpose computers 
and, in fact, computer 100 is intended to be representative of 
a broad category of data processing devices. A raster display 
monitor 160 is shown in communication with I/O circuit 120 s 
and issued to display images generated by CPU 130. Any 
well known variety of cathode ray tube (CRT) or other type 
of display can be used as display 160. A conventional 
keyboard 150 is also shown in communication with I/O 120. 
It will be appreciated by one of ordinary skill in the art that 
computer 100 can be part of a larger system. For example, 
computer 100 can also be in communication with a network 
(e.g., connected to a local area network (LAN)). 

In particular, computer 100 can include obfuscating cir- 
cuitry for enhancing software security in accordance with 
the teachings of the present invention, or as will be appre- 15 
ciated by one of ordinary skill in the art, the present 
invention can be implemented in software executed by 
computer 100 (e.g., the software can be stored in memory 
140 and executed on CPU 130). For example, an unobfus- 
cated program P (e.g., an application), stored in memory 20 
140, can be obfuscated by an obfuscator executing on CPU 
130 to provide an obfuscated program P 1 , stored in memory 
140, in accordance with one embodiment of the present 
invention. 

OVERVIEW OF THE DETAILED DESCRIPTION 25 

FIG. 6 shows the architecture of a Java™ obfuscator. 
According to the inventive method, Java™ application class 
files are passed along with any library files. An inheritance 
tree is constructed as well as a symbol table, providing type 30 
information for all symbols and control flow graphs for all 
methods. The user may optionally provide profiling data 
files as generated by Java™ profiling tools. This information 
can be used to guide the obfuscator to ensure that frequently 
executed parts of the application are not obfuscated by very 35 
expensive transformations. Information is gathered about 
the application using standard compiler techniques such as 
interprocedural dataflow analysis and data dependence 
analysis. Some can be provided by the user and some by 
specialized techniques. The information is used to select and ^ 
apply the appropriate code transformations. 

Appropriate transformations are selected. The governing 
criteria used in selecting the most suitable transformation 
include the requirement that the chosen transformation blend 
in naturally with the rest of the code. This can be dealt with 45 
by favoring transformations with a high appropriateness 
value. A further requirement is that transformations which 
yield a high level of obfuscation with low execution time 
penalty should be favored. This latter point is accomplished 
by selecting transformations that maximize potency and 50 
resilience, and minimize cost. 

An obfuscation priority is allocated to a source code 
object. This will reflect how important it is to obfuscate the 
contents of the source code object. For example, if a 
particular source code object contains highly sensitive pro- 55 
prietary material, then the obfuscation priority will be high. 
An execution time rank is determined for each method, 
which equals 1 if more time is spent executing the method 
than any other. 

The application is then obfuscated by building the appro- 60 
priate internal data structures, the mapping from each source 
code object to the appropriate transformation, the obfusca- 
tion priority, and the execution time rank. The obfuscating 
transformations are applied until the required level of obfus- 
cation has been achieved or until the maximum execution 65 
time penalty is exceeded. The transformed application is 
then written. 



,325 Bl 

6 

The output of the obfuscation tool is a new application 
that is functionally equivalent to the original. The tool can 
also produce Java™ source files annotated with information 
about which transformations have been applied and how the 
obfuscated code relates to the original application. 

A number of examples of obfuscating transformations 
will now be described, again in the context of a Java™ 
obfuscator. 

Obfuscating transformations can be evaluated and classi- 
fied according to their quality. The quality of a transforma- 
tion can be expressed according to its potency, resilience, 
and cost. The potency of a transformation is related to how 
obscure P 1 is in relation to P. Any such metric will be 
relatively vague as it necessarily depends on human cogni- 
tive abilities. For the present purposes it is sufficient to 
consider the potency of a transformation as a measure of the 
usefulness of the transformation. The resilience of a trans- 
formation measures how well a transformation holds up to 
an attack from an automatic deobfuscator. This is a combi- 
nation of two factors: programmer effort and deobfuscator 
effort. Resilience can be measured on a scale from trivial to 
one-way. One-way transformations are extreme in that they 
cannot be reversed. The third component is transformation 
execution cost. This is the execution time or space penalty 
incurred as a result of using the transformed application P\ 
Further details of transformation evaluation are discussed 
below in the detailed description of the preferred embodi- 
ments. The main classification of obfuscating transforma- 
tions is shown in FIG. 2c with details given in FIGS. 2e 
through 2g. 

Examples of obfuscating transforms are as follows: 
Obfuscating transforms may be categorized as follows: 
control obfuscation, data obfiiscations, layout obfuscations, 
and preventive obfuscations. Some examples of these are 
discussed below. 

Control obfuscations include aggregation 
transformations, ordering transformations, and computation 
transformations. 

Computation transformations include: concealing real 
control flow behind irrelevant non-functional statements; 
introducing code sequences at the object code level for 
which there exist no corresponding high-level language 
constructs; and removing real control flow abstractions or 
introducing spurious ones. 

Considering the first classification (control flow), the 
Cyclomatic and Nesting complexity metrics suggest that 
there is a strong correlation between the perceived complex- 
ity of a piece of code and the number of predicates it 
contains. Opaque predicates enable the construction of 
transformations which introduce new predicates into the 
program. 

Referring to FIG. 11a, an opaque predicate P r is inserted 
into the basic block S where S=Sj . . . S rt . This splits S in 
half. The P r predicate is irrelevant code, because it will 
always evaluate to True. In FIG. 116, S is again broken into 
two halves, which are transformed into two different obfus- 
cated versions S a and S b . Therefore, it will not be obvious 
to a reverse engineer that S° and S b perform the same 
function. FIG. 11c is similar to FIG, life, however, a bug is 
introduced into S b . The P r predicate always selects the 
correct version of the code, S a . 

Another type of obfuscation transformation is a data 
transformation. An example of a data transformation is 
deconstructing arrays to increase the complexity of code. An 
array can be split into several subarrays, two or more arrays 
can be merged into a single array, or the dimensions of an 
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array can be increased (flattening) or decreased (folding). application. Having gained physical access to the 

FIG. 24 illustrates a number of examples of array transfer- application, the reverse engineer can decompile it (using 

mations. In statements (1-2), an array A is split into two disassemblers or decompilers) and then analyze its data 

subarrays Al and A2. Al contains elements with even structures and control flow. This can either be done manually 

indices and A2 contains elements with odd indices. State- 5 0 r with the aid of reverse engineering tools, such as program 

ments (3-4) illustrate how two integer arrays B and C can be slicers 

interleaved to produce an array BC. The elements from B n .... . t rT A .. 

and C are evenly spread throughout the transformed array. u Reverse . peering » not a new problem. Until recently, 

Statements (6-7) illustrate folding of array D into array Dl. however ' c li 15 a f oblem that has received relatively little 

Such transformations introduce previously absent data struc- attention from software developers, because most programs 

ture or remove existing data structure. This can greatly 10 are lar S e > monolithic, and shipped as stripped, native code, 

increase the obscurity of the program as, for example, in making them difficult (although never impossible) to reverse 

declaring a 2-dimensional array a programmer usually does engineer. 

so for a purpose, with the chosen structure mapping onto the However, this situation is changing as it is becoming more 

corresponding data. If that array is folded into a 1-d and more common to distribute software in forms that are 

structure, a reverse engineer would be deprived of valuable 15 ea sy to decompile and reverse engineer. Important examples 

pragmatic information. include Java bytecode and the Architecture Neutral Distri- 

Another example of an obfuscating transformation is a bution Format (ANDF). Java applications in particular pose 

preventive transformation. In contrast to control or data a problem to software developers. They are distributed over 

transformations, the main goal of preventive transformations the T nternet ^ j ava c ] ass fii es , a hardware-independent 

is not to obscure the program to a human reader, but to make 20 viftual machiQe ^ ^ wiriual[y all thc i n f ormation 

known automatic deobfuscation techniques more difficult or of the original Java Hencc> ^ dass fiks afC casy 

to exploit known problems in current deobfuscators or t0 dccompile . Moreover, because much of the computation 

decompilers. Such transformations are known as inherent takcs place in standard libraries, Java programs are often 

and targeted respectively. An example of an inherent pre- small in sizc and thcreforc rc i at ively easy to reverse engi- 

ventive transformation is reordering a for-loop to run back- 25 neer 

ward. Such reordering is possible if the loop has no loop- r,, „ . r T ji • ( 

. , . , , , • a j lh . u r .i Th e main concern of Java developers is not outright 

earned data dependencies. A deobruscator could perform the . . r t . — * . , , 

* . j j iL 1 * c I reengineenng of entire applications. There is relatively little 

same analysis and reorder the loop to forward execution. , ° . r\ u * u •* i i * i * • 

WT -r * - , j *\ . i j j * ji value in such behavior, because it clearly violates copyright 

However if a bo^is data dependency is added to the hw [2g] ^ ^ be ^ > M «u* 

^ e ^^^^f abaootfboloo P aA,isnmkT - 30 developcs are mostly frightened by the prospect of a 

ing ou proven e ■ competitor being able to extract proprietary algorithms and 

Further specific examples of obfuscating transformations data structures from their applications in order to incorporate 

are discussed below in the detailed description of the pre- ^ ^ thdr 0WQ programSt Not only does it give the 

ferred embodiments. competitor a commercial edge (by cutting development time 

DETAILED DESCRIPTION OF THE and cost), but it is also difficult to detect and pursue legally. 

PREFERRED EMBODIMENTS The last point is particularly valid for small developers who 

It has become more and more common to distribute ma y iU aff ° rd Ien & th y Ie S al battles a S ainsl powerful corpo- 

software in forms that retain most or all of the information ratlons ^ ^ unhmited le S al bud 8 ets - 

present in the original source code. An important example is 40 ^ overview of various forms of protection for providing 

Java bytecode. Because such codes are easy to decompile, le S al protection or security for software is provided in FIG. 

they increase the risk of malicious reverse engineering 2. FIG. 2 provides a classification of (a) kinds of protection 

attacks. against malicious reverse engineering, (b) the quality of an 

Accordingly, several techniques for technical protection obfuscating transformation, (c) information targeted by an 

of software secrets are provided in accordance with one 45 obfuscating transformation, (d) layout obfuscations, (e) data 

embodiment of the present invention. In the detailed obfuscations, (f) control obfuscations, and (g) preventive 

description of the preferred embodiments, we will argue that obfuscations. 

automatic code obfuscation is currently the most viable The various forms of technical protection of intellectual 
method for preventing reverse engineering. We then property, which are available to software developers are 
describe the design of a code obfuscator, an obfuscation tool 50 discussed below. We will restrict our discussion to Java 
that converts a program into an equivalent one that is more programs distributed over the Internet as Java class-files, 
difficult to understand and reverse engineer. although most of our results will apply to other languages 
The obfuscator is based on the application of code and architecture-neutral formats as well, as will be apparent 
transformations, in many cases similar to those used by 10 onc of ordina ry skill in the art. We will argue that the only 
compiler optimizers. We describe a large number of such 55 reasonable approach to the protection of mobile code is code 
transformations, classify them, and evaluate them with obfuscation. We will furthermore present a number of obfus- 
respect to their potency (e.g., To what degree is a human catin S transformations, classify them according to effective- 
reader confused?), resilience (e.g., How well are automatic ness and efficiency, and show how they can be put to use in 
deobfuscation attacks resisted?), and cost (e.g., How much an automatic obfuscation tool. 

performance overhead is added to the application?). 60 The remainder of the detailed description of the preferred 

We finally describe various deobfuscation techniques embodiments is structured as follows. In Section 2, we give 

(such as program slicing) and possible countermeasures an an overview of different forms of technical protection 

obfuscator could employ against them. against software theft and argue that code obfuscation cur- 
rently affords the most economical prevention. In Section 3, 

1 INTRODUCTION 65 we g j ve a br j e f overv i ew 0 f the design of Kava, a code 

Given enough time, effort, and determination, a compe- obfuscator for Java, which is currently under construction, 

tent programmer will always be able to reverse engineer any Sections 4 and 5 describe the criteria we use to classify and 
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evaluate different types of obfuscating transformations. Sec- The Java™ programming language has gained popularity 

tions 6, 7, 8, and 9 present a catalogue of obfuscating mainly because of its architecture neutral bytecode. While 

transformations. In Section 10, we give more detailed obfus- this clearly facilitates mobile code, it does decrease the 

cation algorithms. In Section 11, we conclude with a sum- performance by an order of magnitude in comparison to 

mary of our results and a discussion of future directions of 5 native code. Predictably, this has lead to the development of 

code obfuscation. just-in-time compilers that translate Java bytecodes to native 

2 PROTECTING INTELLECTUAL PROPERTY °° dc oa '^ c '^y- Mcc °° uld make usc of such translators to 

. ,i create native code versions of her application for all popular 

Consider the following scenario. Alice is a small software architeclurcs> When downloading the application, the user's 

developer who wants to make her applications available to ]Q ^ wouM havc tQ id(mlif {h& architecturc/operatillg system 

users over the Internet presumab y at a charge. Bob is a rival it ^ and ^ corrcspODding vers i 0 n 

developer who feels that he could gain a commercial edge wou]d bc transmitted> for fc „ shown m Fia 46> 

oyer Alice if he had access to her application s key algo- Qnly faaving acccss {Q ^ flativc codc ^ make Bob > s ^ 

rithms and data structures. mQrc although not impossible. There is a further 

This can be seen as a two-player game between two 35 complication with transmitting native code. The problem is 
adversaries: the software developer (Alice) who tries to that— unlike Java bytecodes, which are subjected to byte- 
protect her code from attack, and the reverse engineer (Bob) ^ verificat i on bc fore execution, —native codes cannot be 
whose task it is to analyze the application and convert it into mn with comp icte security on the user's machine. If Alice is 
a form that is easy to read and understand. Note that it is not a trustcd mcm ber of the community, the user may accept her 
necessary for Bob to convert the application back to some- 20 assurances that the application does not do anything harmful 
thing close to Alice's original source; all that is necessary is at the user > s end To make sure that no one tries to conlami . 
that the reverse engineered code be understandable by Bob nate tnc application, Alice would have to digitally sign the 
and his programmers. Note also that it may not be necessary codes ^ tney are oeing transmitted, to prove to the user that 
for Alice to protect her entire application from Bob; it tne code was tne original one written by her. 

probably consists mostly of "bread-and-butter code" that is 0(: ™ A , , , . , . , 

% , . t t . 3 25 The final approach we are going to consider is code 

of no real interest to a competitor, obfuscation, for example, as shown in FIG. 5. The basic idea 

Akce can protect her code from Bob s attack usmg either ^ fof ^ tQ mn her a lication thr0 h m obfpscalori a 

legal or technical protection such as shown in FIG. 2a, m ^ , ransforms the application int0 one that is 

which is discussed above. While copyright law does cover ^,^,,5, identica i ^ the origil]al but which ^ much more 

software art Acts economic realises make it difficult for a 30 dfficuU fof Bob tQ understand It k our bdief ^ obfus . 

small company like Alice s to enforce the law against a cation j, a viab]e ^ chaU]w for p rotec ting software trade 

larger and more powerful competitor. A more attractive secre(s ^ has to receiye me ^ u deserves 

solution is for Alice to protect her code by making reverse . . 

engineering so technically difficult that it becomes impos- Un ike server-side execution, code obfuscation can never 

sible or at the very least economically inviable. Some early 35 completely protect an application from malicious reverse 

attempts at technical protection are described by Gosler. 2*""^ efforls - G ' v , en en ° ugh hm !. an , d determination, 

(James R. Gosler. Software protection: Myth or reality? In Bob wJ ! alwa y s be able «° dissect Alice s application to 

CRYPTO'85-Advances in Cryptology, pages 140-157, retrieve its ^portant algontlims aiKl data structures. To aid 

August 1985) effort, Bob may try to run the obfuscated code through 

T, a „, '_„ o nn , • r A1 . . in « . an automatic deobfuscator that attempts to undo the obfus- 

The most secure approach is for Alice not to sell her 40 transformations 

application at all, but rather sell its services. In other words, * 

users never gain access to the application itself but rather Hence, the level of security from reverse engineering that 

connect to Alice's site to mn the program remotely as shown an obfuscator adds to an application depends on, for 

in FIG. 3a, paying a small amount of electronic money every example, (a) the sophistication of the transformations 

time. The advantage to Alice is that Bob will never gain 45 employed by the obfuscator, (b) the power of the available 

physical access to the application and hence will not be able deobfuscation algorithms, and (c) the amount of resources 

to reverse engineer it. The downside is of course that, due to ( time and s P ace ) available to the deobfuscator. Ideally, we 

limits on network bandwidth and latency, the application would ^ t0 mimic the situation in current public-key 

may perform much worse than if it had run locally on the crypto systems, in which there is a dramatic difference in the 

user's site. A partial solution is to break the application into 50 0051 of encrv P tion (finding large primes is easy) and decryp- 

two parts: a public part that runs locally on the user's site, tlon (factoring large numbers is difficult). We will see that 

and a private part (that contains the algorithms that Alice therc are > in fact > obfuscating transformations that can be 

wants to protect) that is run remotely, for example, as shown a PP hed in polynomial time but which require exponential 

in FIG. 3b t ' me to deobfuscate, as discussed below. 

Another approach would be for Alice to encrypt her code 55 3 ^ DESIGN OF A JAVA OBFUSCATOR 

before it is sent off to the users, for example, as shown in 

FIG. 4a. Unfortunately, this only works if the entire FIG. 6 shows an architecture of Kava, the Java obfuscator. 

decryption/execution process takes place in hardware. Such The main input to the tool is a set of Java class files and the 

systems are described in Herzberg (Amir Herzberg and obfuscation level required by the user. The user may option- 

Shlomit S. Pinter. Public protection of software. ACM 60 ally provide files of profiling data, as generated by Java 

Transactions on Computer Systems, 5(4): 371 -3 93, Novem- profiling tools. This information can be used to guide the 

ber 1987.) and Wilhelm (Uwe G. Wilhelm. Cryptograph- obfuscator to make sure that frequently executed parts of the 

cally protected objects, http://lsewww.epfl.ch/~wilhelm/ application are not obfuscated by very expensive transfor- 

CryPO. html, 1997). If the code is executed in software by mations. Input to the tool is a Java application, given as a set 

a virtual machine interpreter (as is most often the case with 65 of Java class files. The user also selects the required level of 

Java bytecodes), then it will always be possible for Bob to obfuscation (e.g., potency) and the maximum execution 

intercept and decompile the decrypted code. time/space penalty that the obfuscator is allowed to add to 
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the application (the cost). Kava reads and parses the class Secondly, we classify a transformation according to the 

files along with any library files referenced directly or kind of operation it performs on the targeted information. As 

indirectly. A complete inheritance tree is constructed, as well can be seen from FIGS. 2d through 2g, there are several 

as a symbol table giving type information for all symbols, transformations that manipulate the aggregation of control 

and control flow graphs for all methods. 5 or data. Such transformations typically break up abstractions 

Kava contains a large pool of code transformations, which created by the programmer, or construct new bogus abstrac- 

are described below. Before these can be applied, however, ^ons by bundling together unrelated data or control, 

a preprocessing pass collects various types of information Similarly, some transformations affect the ordering of 

about the application in accordance with one embodiment. data or control. In many cases the order in which two items 

Some kinds of information can be gathered using standard 10 arc declared or two computations are performed has no 

compiler techniques such as inter-procedural dataflow effect on the observable behavior of the program. There can, 

analysis and data dependence analysis, some can be pro- however, be much useful information embedded in the 

vided by the user, and some are gathered using specialized chosen order, to the programmer who wrote the program as 

techniques. Pragmatic analysis, for example, analyzes the wel1 as to a reverse engineer. The closer two items or events 

application to see what sort of language constructs and 15 are m space or time, the higher the likelihood that they are 

programming idioms it contains. related in one way or another. Ordering transformations try 

Be information gathered during the preprocessing pass is t0 cx P lorc this b * randomizing the order of declarations or 

used to select and apply appropriate code transformations. computations. 

All types of language constructs in the application can be the 5 EVALUATING OBFUS CATION 

subject of obfuscation: for example, classes can be split or 20 TRANSFORMATIONS 
merged, methods can be changed or created, new control and 

data structures can be created and original ones modified. Before we can attempt to design any obfuscating 

New constructs added to the application can be selected to transformations, we should be able to evaluate the quality of 

be as similar as possible to the ones in the source application, such a transformation. In this section we will attempt to 

based on the pragmatic information gathered during the 25 classify transformations according to several criteria: how 

preprocessing pass. much obscurity they add to the program (e.g., potency), how 

The transformation process is repeated until the required 6im ™ [i are to break for a deobfuscator (e.g., 

potency has been achieved or the maximum cost has been resilience), and how much computational overhead they add 

exceeded. The output of the tool is a new application- 3Q t0 lhe obfuscated application (e.g., cost), 

functionally equivalent to the original one— normally given 5<1 Measiires of Potency 
as a set of Java class files. The tool will also be able to 

produce Java source files annotated with information about Wc wil1 first dcfine what il means for a program P to be 

which transformations have been applied, and how the more obscure (or complex or unreadable) than a program P. 

obfuscated code relates to the original source. The annotated Any such metric will, by definition, be relatively vague, 

source will be useful for debugging. because it must be based (in part) on human cognitive 

abilities. 

4 CLASSIFYING OBFUSCATING Fortunately, we can draw upon the vast body of work in 

TRANSFORMATIONS ihQ Software Complexity Metrics branch of Software Engi- 

In the remainder of this detailed description of the pre- 40 neering. In this field, metrics are designed with the intent to 

f erred embodiments we will describe, classify, and evaluate a ^ tne construction of readable, reliable, and maintainable 

various obfuscating transformations. We start by formaliz- software. The metrics are frequently based on counting 

ing the notion of an obfuscating transformation: various textual properties of the source code and combining 

Definition 1 (Obfuscating Transformation) Let P- r ->P' be a these counts into a measure of complexity. While some of 

legal obfuscating transformation in which the following 45 the formulas that have been proposed have been derived 

conditions must hold: from empirical studies of real programs, others have been 

If P fails to terminate or terminates with an error P urel y speculative. 

condition, then P may or may not terminate. The detailed complexity formulas found in the metrics' 

Otherwise, P* must terminate and produce the same output literature can be used to derive general statements, such as: 

as p 50 "if programs P and P are identical except that P' contains 

Observable behavior is defined loosely as "behavior as more of property q than P, then F is more complex than P." 

experienced by the user." This means that P' may have Given such a statement, we can attempt to construct a 

side-effects (such as creating files or sending messages over transformation that adds more of the q-property to a 

the Internet) that P does not, as long as these side effects are program, knowing that this is likely to increase its obscurity, 

not experienced by the user. Note that we do not require P 55 FIG. 7 is a table that tabulates some of the more popular 

and P' to be equally efficient. In fact, many of our transfer- complexity measures, in which E(x) is the complexity of a 

mations will result in P' being slower or using more memory software component x, F is a function or method, C is a 

than P. class, and P is a program. When used in a software con- 

The main dividing line between different classes of obfus- struction project the typical goal is to minimize these 

cation techniques is shown in FIG. 2c. We primarily classify eo measures. In contrast, when obfuscating a program we 

an obfuscating transformation according to the kind of generally want to maximize the measures, 

information it targets. Some simple transformations target The complexity metrics allow us to formalize the concept 

the lexical structure (the layout) of the application, such as of potency and will be used below as a measure of the 

source code formatting, and names of variables. In one usefulness of a transformation. Informally, a transformation 

embodiment, the more sophisticated transformations that we 65 is potent if it does a good job confusing Bob, by hiding the 

are interested in, target either the data structures used by the intent of Alice's original code. In other words, the potency 

application or its flow of control. of a transformation measures how much more difficult the 
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obfuscated code is to understand (for a human) than the 
original code. This is formalized in the following definition: 
Definition 2 (Transformation Potency) Let T be a behavior- 
conserving transformation, such that P- r ->P transforms a 
source program P into a target program P\ Let E(P) be the 5 
complexity of P, as defined by one of the metrics of FIG. 7. 

T por (P), the potency of T with respect to a program P, is 
a measure of the extent to which T changes the complexity 
of P. It is defined as 

T is a potent obfuscating transformation if T /w XP)>0. 

For the purposes of this discussion, we will measure 
potency on a three-point scale, (low, medium, high). 

The observations in Table 1 make it possible for us to list 15 
some desirable properties of a transformation X In order for 
T to be a potent obfuscating transformation, it should 
increase overall program size (u z ) and introduce new 

classes and methods (u* 7 ). 
introduce new predicates (uj and increase the nesting 20 

level of conditional and looping constructs (U 3 ). 
increase the number of method arguments (U 3 ) and inter- 
class instance variable dependencies (u^ 7 ). 
increase the height of the inheritance tree (u* ,c 7 ) ^ 
increase long-range variable dependencies (U 4 ). 

5.2 Measures of Resilience 

At first glance it would seem that increasing T^P) would 
be trivial. To increase the u^ metric, for example, all we have 30 
to do is to add some arbitrary if-statements to P: 



main( ) { main( ) { 

SI; SI; 35 

S2; = T o> if (5==2) SI; 

S2; } 

if (1>2) S2; 

} 



Unfortunately, such transformations are virtually useless, 
because they can easily be undone by simple automatic 
techniques. It is therefore necessary to introduce the concept 
of resilience, which measures how well a transformation 
holds up under attack from an automatic deobfuscator. For 45 
example, the resilience of a transformation T can be seen as 
the combination of two measures: 

Programmer Effort: the amount of time required to con- 
struct an automatic deobfuscator that is able to effec- 
tively reduce the potency of T, and 50 
Deobfuscator Effort: the execution time and space 
required by such an automatic deobfuscator to effec- 
tively reduce the potency of T. 
It is important to distinguish between resilience and 
potency. A transformation is potent if it manages to confuse 55 
a human reader, but it is resilient if it confuses an automatic 
deobfuscator. 

We measure resilience on a scale from trivial to one way, 
as shown in FIG. 8a. One-way transformations are special, 
in the sense that they can never be undone. This is typically 60 
because they remove information from the program that was 
useful to the human programmer, but which is not necessary 
in order to execute the program correctly. Examples include 
transformations that remove formatting, and scramble vari- 
able names. 65 

Other transformations typically add useless information 
to the program that does not change its observable behavior, 
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but which increases the "information load" on a human 
reader. These transformations can be undone with varying 
degrees of difficulty. 

FIG. Sb shows that deobfuscator effort is classified as 
either polynomial time or exponential time. Programmer 
effort, the work required to automate the deobfuscation of a 
transformation T, is measured as a function of the scope of 
T. This is based on the intuition that it is easier to construct 
counter-measures against an obfuscating transformation that 
only affects a small part of a procedure, than against one that 
may affect an entire program. 

The scope of a transformation is defined using terminol- 
ogy borrowed from code optimization theory: T is a local 
transformation if it affects a single basic block of a control 
flow graph (CFG), it is global if it affects an entire CFG, it 
is inter-procedural if it affects the flow of information 
between procedures, and it is an interprocess transformation 
if it affects the interaction between independently executing 
threads of control. 

Definition 3 (Transformation Resilience) Let T be a 
behavior-conserving transformation, such that P= r >=>P* 
transforms a source program P into a target program P'. 
T WJ (P) is the resilience of T with respect to a program P. 

T rej (P) is a one-way transformation if information is 
removed from P such that P cannot be reconstructed from P'. 
Otherwise, 

7V« '^-Resilience (7o««towiw> cSciftt T PrvsrairmBr ^ t ), 

in which Resilience is the function defined in the matrix in 
FIG. Sb. 

5.3 Measures of Execution Cost 

In FIG. 2b, we see that potency and resilience are two of 
the three components describing the quality of a transfor- 
mation. The third component, the cost of a transformation, 
is the execution time or space penalty that a transformation 
incurs on an obfuscated application. We classify the cost on 
a four-point scale (free, cheap, costly, dear), in which each 
point is defined below: 

Definition 5 (Transformation Cost) Let T be a behavior- 
conserving transformation, such that T coa j(?) ejdear, costly, 
cheap, free} with. T cof (P)=free, if executing P' requires O(l) 
more resources than P; otherwise T cay XP) =cnea P> if execut- 
ing P' requires O(n) more resources than P; otherwise 
T C(Wf (P)=costly, if executing P 1 requires 0(1^), with p>l, 
more resources than P; otherwise T^X^dear (i.e., execut- 
ing P' requires exponentially more resources than P). 

It should be noted that the actual cost associated with a 
transformation depends on the environment in which it is 
applied. For example, a simple assignment statement a«5 
inserted at the top-most level of a program will only incur a 
constant overhead. The same statement inserted inside an 
inner loop will have a substantially higher cost. Unless noted 
otherwise, we always provide the cost of a transformation as 
if it had been applied at the outermost nesting level of the 
source program. 

5.4 Measures of Quality 

We can now give a formal definition of the quality of an 
obfuscating transformation: 

Definition 6 (Transformation Quality) T^/P), the quality 
of a transformation T, is defined as the combination of the 
potency, resilience, and cost of T: T_ ^-(TJP), TJP), 
T ca ,/P)). 

5.5 Layout Transformations 

Before we explore novel transformations, we will briefly 
consider the trivial layout transformations, which, for 
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example, are typical of current Java obfuscators such as 
Crema. (Haas Peter Van Vliet. Crema — The Java obfuscator. 
http://web. inter.nl. net/use rs/H. P. van. Vliet/crema.html, 
January 1996). The first transformation removes the source 
code formatting information sometimes available in Java 
class files. This is a one-way transformation, because once 
the original formatting is gone it cannot be recovered; it is 
a transformation with low potency, because there is very 
little semantic content in formatting, and no great confusion 
is introduced when that information is removed; finally, this 
is a free transformation, because the space and time com- 
plexity of the application is not affected. 

Scrambling identifier names is also a one-way and free 
transformation. However, it has a much higher potency than 
formatting removal, because identifiers contain a great deal 
of pragmatic information. 

6 CONTROL TRANSFORMATIONS 

In this and the next few sections we will present a 
catalogue of obfuscating transformations. Some have been 
derived from well-known transformations used in other 
areas such as compiler optimization and software 
reengineering, others have been developed for the sole 
purpose of obfuscation, in accordance with one embodiment 
of the present invention. 

In this section we will discuss transformations that 
attempt to obscure the control-flow of the source applica- 
tion. As indicated in FIG. 2/, we classify these transforma- 
tions as affecting the aggregation, ordering, or computations 
of the flow of control. Control aggregation transformations 
break up computations that logically belong together or 
merge computations that do not. Control ordering transfor- 
mations randomize the order in which computations are 
carried out. Computation transformations can insert new 
(redundant or dead) code, or make algorithmic changes to 
the source application. 

For transformations that alter the flow of control, a certain 
amount of computational overhead will be unavoidable. For 
Alice this means that she may have to choose between a 
highly efficient program, and one that is highly obfuscated. 
An obfuscator can assist her in this trade-off by allowing her 
to choose between cheap and expensive transformations. 

6.1 Opaque Predicates 

The real challenge when designing control-altering trans- 
formations is to make them not only cheap, but also resistant 
to attack from deobfuscators. To achieve this, many trans- 
formations rely on the existence of opaque variables and 
opaque predicates. Informally, a variable V is opaque if it 
has some property q that is known a priori to the obfuscator, 
but which is difficult for the deobfuscator to deduce. 
Similarly, a predicate P (a Boolean expression) is opaque if 
a deobfuscator can deduce its outcome only with great 
difficulty, while this outcome is well known to the obfusca- 
tor. 

Being able to create opaque variables and predicates that 
are difficult for a deobfuscator to crack is a major challenge 
to a creator of obfuscation tools, and the key to highly 
resilient control transformations. We measure the resilience 
of an opaque variable or predicate (i.e., its resistance to 
deobfuscation attacks) on the same scale as transformation 
resilience (i.e., trivial, weak, strong, full, one-way). 
Similarly, we measure the added cost of an opaque construct 
on the same scale as transformation cost (i.e., free, cheap, 
costly, dear). 

Definition 7 (Opaque Constructs) A variable V is opaque at 
a point p in a program, if V has a property q at p, which is 
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known at obfuscation time. We write this as V*, or V ff if p 
is clear from the context. A predicate P is opaque at p if its 
outcome is known at obfuscation time. We write p F p (P r p ) if 
P always evaluates to False (True) at p, and P ? p if P 

5 sometimes evaluates to True and sometimes to False. Again, 
p will be omitted if clear from the context. FIG. 9 shows 
different types of opaque predicates. Solid lines indicate 
paths that may sometimes be taken, and dashed lines indi- 
cate paths that will never be taken. 

10 Below we give some examples of simple opaque con- 
structs. These are easy to construct for the obfuscator and 
equally easy to crack for the deobfuscator. Section 8 pro- 
vides examples of opaque constructs with much higher 
resilience. 

15 6.1.1 Trivial and Weak Opaque Constructs 

An opaque construct is trivial if a deobfuscator can crack 
it (i.e., deduce its value) by a static local analysis. An analysis 
is local if it is restricted to a single basic block of a control 
flow graph. FIGS. 10a and 105 provide examples of (a) 

20 trivial opaque constructs and (b) weak opaque constructs. 
We also consider an opaque variable to be trivial if it is 
computed from calls to library functions with simple, well- 
understood semantics. For a language like the Java™, lan- 
guage which requires all implementations to support a 

25 standard set of library classes, such opaque variables are 
easy to construct. A simple example is int v* [l,5]-random 
(1,5), in which random(a, b) is a library function that returns 
an integer in the range a . . . b. Unfortunately, such opaque 
variables are equally easy to deobfuscate. All that is required 

30 is for the deobfuscator-designer to tabulate the semantics of 
all simple library functions, and then pattern-match on the 
function calls in the obfuscated code. 

An opaque construct is weak if a deobfuscator can crack 
it by a static global analysis. An analysis is global if it is 

35 restricted to a single control flow graph. 

6.2 Computation Transformations 

Computation Transformations fall into three categories: 
hide the real control-flow behind irrelevant statements that 
40 do not contribute to the actual computations, introduce code 
sequences at the object code level for which there exist no 
corresponding high-level language constructs, or remove 
real control-flow abstractions or introduce spurious ones. 

6.2.1 Insert Dead or Irrelevant Code 

45 The u 2 and u 3 metrics suggest that there is a strong 
correlation between the perceived complexity of a piece of 
code and the number of predicates it contains. Using opaque 
predicates, we can devise transformations that introduce 
new predicates in a program. 

50 Consider the basic block S^Sj . . . Sn in FIG. 11. In FIG. 
11a, we insert an opaque predicate P r into S, essentially 
splitting it in half. The P r predicate is irrelevant code, 
because it will always evaluate to True. In FIG. 116, we 
again break S into two halves, and then proceed to create two 

55 different obfuscated versions S a and S of the second half. 
S a and S b will be created by applying different sets of 
obfuscating transformations to the second half of S. Hence, 
it will not be directly obvious to a reverse engineer that S a 
and S b in fact perform the same function. We use a predicate 

60 P 7 to select between S° and S b at runtime. 

FIG. 11c is similar to FIG. Ufa, but this time we introduce 
a bug into S b . The P r predicate always selects the correct 
version of the code, S a . 

6.2.2 Extend Loop Conditions 

65 FIG. 12 shows how we can obfuscate a loop by making 
the termination condition more complex. The basic idea is to 
extend the loop condition with a P r or predicate that will 
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not affect the number of times the loop will execute. The Java™. The Java™ library has no standard class that pro- 
predicate we have added in FIG. 12d, for example, will vides common list operations such as insert, delete, and 
always evaluate to True because x 2 (x+l) 2 =0(mod4). enumerate. Instead, most Java™ programmers will construct 
6.2.3 Convert a Reducible to a Non-Reducible Flow Graph lists of objects in an ad hoc fashion by linking them together 

Often, a programming language is compiled to a native or 5 on a next field. Iterating through such lists is a very common 

virtual machine code, which is more expressive than the pattern in Java™ programs. Techniques invented in the field 

language itself. When this is the case, it allows us to devise of automatic program recognition (see Linda Mary Wills, 

language-breaking transformations. A transformation is Automated program recognition: a feasibility demonstra- 

language-breaking if it introduces virtual machine (or native tion. Artificial Intelligence, 45(1-2): 113-172, 1990, incor- 

code) instruction sequences that have no direct correspon- 10 porated herein by reference) can be used to identify common 

dence with any source language construct. When faced with patterns and replace them with less obvious ones. In the 

such instruction sequences a deobfuscator will either have to linked list case, for example, we might represent the stan- 

try to synthesize an equivalent (but convoluted) source dard list data structure with a less common one, such as 

language program or give up altogether. cursors into an array of elements. 

For example, the Java™ bytecode has a goto instruction, 15 6.2.5 Table Interpretation 

but the Java™ language has no corresponding goto state- One of the most effective (and expensive) transformations 

ment. This means that the Java™ bytecode can express is table interpretation. The idea is to convert a section of 

arbitrary control flow, whereas the Java™ language can only code (Java bytecode in this example) into a different virtual 

(easily) express structured control flow. Technically, we say machine code, This new code is then executed by a virtual 

that the control flow graphs produced from Java™ programs 20 machine interpreter included with the obfuscated applica- 

will always be reducible, but the Java™ bytecode can tion. Obviously, a particular application can contain several 

express non-reducible flow graphs. interpreters, each accepting a different language and execut- 

Since expressing non-reducible flow graphs becomes very ing a different section of the obfuscated application, 

awkward in languages without gotos, we construct a trans- Because there is usually an order of magnitude slow down 

formation that converts a reducible flow graph to a non- 25 for each level of interpretation, this transformation should be 

reducible one. This can be done by turning a structured loop reserved for sections of code that make up a small part of the 

into a loop with multiple headers. For example, in FIG. 13a, total runtime or which need a very high level of protection, 

we add an opaque predicate to a while loop, to make it 6,2.6 Add Redundant Operands 

appear that there is a jump into the middle of the loop. In Once we have constructed some opaque variables we can 

fact, this branch will never be taken. 30 use algebraic laws to add redundant operands to arithmetic 

A Java™ decompiler would have to turn a non-reducible expressions. This will increase the u a metric. Obviously, this 

flow graph into one which either duplicates code or which technique works best with integer expressions where 

contains extraneous Boolean variables. Alternatively, a numerical accuracy is not an issue. In the obfuscated state- 

deobfuscator could guess that all non-reducible flow graphs ment (V) below we make use of an opaque variable P whose 

have been produced by an obfuscator, and simply remove 35 value is 1. In statement (2') we construct an opaque subex- 

the opaque predicate. To counter this we can sometimes use pression P/Q whose value is 2. Obviously, we can let P and 

the alternative transformation shown in FIG. 13b. If a Q take on different values during the execution of the 

deobfuscator blindly removes p*", the resulting code will be program, as long as their quotient is 2 whenever statement 

incorrect. (2') is reached. 

In particular, FIGS. 13a and 13b illustrate a transforma- 40 
tion for transforming a Reducible flow graph to a Non- 

Reducible Flow graph. In FIG. 13a, we split the loop body j 

S2 into two parts (S a 2 and S^, and insert a bogus jump to " "* 22 ^JliF^ 
the beginning of S 2 . In FIG. 13b y we also break SI into two 



parts, S° 1 and S b v S b 1 is moved into the loop and an opaque 45 

predicate P 7 * ensures that S b 1 is always executed before the 6,2.7 Parallelize Code 

loop body. A second predicate ensures that S b 1 is only Automatic parallelization is an important compiler opti- 
executed once. mization used to increase the performance of applications 
6.2.4 Remove Library Calls and Programming Idioms running on multi-processor machines. Our reasons for want- 
Most programs written in Java rely heavily on calls to the so ing to parallelize a program, of course, are different. We 
standard libraries. Because the semantics of the library want to increase parallelism not to increase performance, but 
functions are well known, such calls can provide useful to obscure the actual flow of control. There are two possible 
clues to a reverse engineer. The problem is exacerbated by operations available to us: 

the fact that references to Java library classes are always by 1. We can create dummy processes that perform no useful 

name, and these names cannot be obfuscated. 55 task, and 

In many cases the obfuscator will be able to counter this 2, We can split a sequential section of the application code 

by simply providing its own versions of the standard librar- into multiple sections executing in parallel, 

ies. For example, calls to the Java Dictionary class (which If the application is running on a single-processor 

uses a hash table implementation) could be turned into calls machine, we can expect these transformations to have a 

to a class with identical behavior, but implemented as, for 60 significant execution time penalty. This may be acceptable in 

example, a red-black tree. The cost of this transformation is many situations, because the resilience of these transforma- 

not so much in execution time, but in the size of the program. tions is high: static analysis of parallel programs is very 

A similar problem occurs with cliches (or patterns), difficult, because the number of possible execution paths 

common programming idioms that occur frequently in many through a program grows exponentially with the number of 

applications. An experienced reverse engineer will search 65 executing processes. Parallelization also yields high levels 

for such patterns to jump-start his understanding of an of potency: a reverse engineer will find a parallel program 

unfamiliar program. As an example, consider linked lists in much more difficult to understand than a sequential one. 
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As shown in FIG. 14, a section of code can be easily and parameter lists of the methods and add an extra parara- 

parallelized if it contains no data dependencies. For eter (or global variable) to discriminate between calls to the 

example, if Sj and S z are two data -independent statements individual methods. Ideally, the methods should be similar 

they can be run in parallel. In a programming language like in nature to allow merging of common code and parameters, 

the Java™ language that has no explicit parallel constructs, 5 This is the case in FIG. 18, in which the first parameter of 

programs can be parallelized using calls to thread Ml and M2 have the same type, 

(lightweight process) libraries. 6.3.3 Clone Methods 

As shown in FIG. 15, a section of code that contains data When trying to understand the purpose of a subroutine a 

dependencies can be split into concurrent threads by insert- reverse engineer will of course examine its signature and 

ing appropriate synchronization primitives, such as await 10 body. However, equally important to understanding the 

and advance (see Michael Wolfe. High Performance Com- behavior of the routine are the different environments in 

pilers For Parallel Computing. Addison-Wesley, 1996. ISBN which it is being called. We can make this process more 

0-8053-2730-4, incorporated herein by reference). Such a difficult by obfuscating a method's call sites to make it 

program will essentially be running sequentially, but the appear that different routines are being called, when, in fact, 

flow of control will be shifting from one thread to the next. 15 this is not the case. 

, _ . . _ ~ FIG. 19 shows how we can create several different 

6.3 Aggregation Transformations VCfsions of a mcthod by app]yin g Afferent sets of obfuscat- 

Programmers overcome the inherent complexity of pro- ing transformations to the original code. We use method 
gramming by introducing abstractions. There is abstraction dispatch to select between the different versions at runtime, 
on many levels of a program, but the procedural abstraction 20 Method cloning is similar to the predicate insertion trans- 
is the most important one. For this reason, obscuring pro- formations in FIG. 11, except that here we are using method 
cedure and method calls can be important to the obfuscator. dispatch rather than opaque predicates to select between 
Below, we will consider several ways in which methods and different versions of the code, 
method invocations can be obscured: inlining, outlining, 6.3.4 Loop Transformations 

interleaving, and cloning. The basic idea behind all of these 25 A large number of loop transformations have been 

is the same: (1) code that the programmer aggregated into a designed with the intent to improve the performance of (in 

method (presumably because it logically belonged together) particular) numerical applications. See Bacon [2] for a 

should be broken up and scattered over the program and (2) comprehensive survey. Some of these transformations are 

code that seems not to belong together should be aggregated useful to us, because they also increase the complexity 

into one method. 30 metrics, which are discussed above with respect to FIG. 7. 

6.3.1 Inline and Outline Methods Loop Blocking, as shown in FIG. 20a, is used to improve the 
Inlining is, of course, a important compiler optimization, cache behavior of a loop by breaking up the iteration space 

It is also an extremely useful obfuscation transformation, so that the inner loop fits in the cache. Loop unrolling, as 

because it removes procedural abstractions from the pro- shown in FIG. 20fc, replicates the body of a loop one or more 

gram. Inlining is a highly resilient transformation (it is 35 times. If the loop bounds are known at compile time the loop 

essentially one-way), because once a procedure call has been can be unrolled in its entirety. Loop fission, as shown in FIG. 

replaced with the body of the called procedure and the 20c, turns a loop with a compound body into several loops 

procedure itself has been removed, there is no trace of the with the same iteration space. 

abstraction left in the code. FIG. 16 shows how procedures All three transformations increase the u 1 and u^ metrics, 

P and Q are inlined at their call-sites, and then removed from 40 because they increase the source application's total code size 

the code. and number of conditions. The loop blocking transformation 

Outlining (turning a sequence of statements into a also introduces extra nesting, and hence also increases the u 3 

subroutine) is a very useful companion transformation to metric. 

inlining. We create a bogus procedural abstraction by Applied in isolation, the resilience of these transforma- 

exlracting the beginning of Q's code and the end of P's code 45 tions is quite low. It does not require much static analysis for 

into a new procedure R. a deobfuscator to reroll an unrolled loop. However, when the 

In object-oriented languages such as the Java™ language, transformations are combined, the resilience rises dramati- 

inlining may, in fact, not always be a fully one-way trans- cally. For example, given the simple loop in FIG. 20b, we 

formation. Consider a method invocation m.P( ). The actual could first apply unrolling, then fission, and finally blocking, 

procedure called will depend on the run-time type of m. In 50 Returning the resulting loop to its original form would 

cases when more than one method can be invoked at a require a fair amount of analysis for the deobfuscator. 

particular call site, we inline all possible methods (see . 

Jeffrey Dean. Whole-Program Optimization of Object- 6.4 Ordenng Transformations 

Oriented Languages. PhD thesis, University of Washington, Programmers tend to organize their source code to maxi- 

1996, incorporated herein by reference) and select the appro- 55 mize its locality. The idea is that a program is easier to read 

priate code by branching on the type of m). Hence, even and understand if two items that are logically related are also 

after inlining and removal of methods, the obfuscated code physically close in the source text. This kind of locality 

may still contain some traces of the original abstractions. For works on every level of the source: for example, there is 

example, FIG. 17 illustrates inlining method calls. Unless locality among terms within expressions, statements within 

we can statically determine the type of m, all possible 60 basic blocks, basic blocks within methods, methods within 

methods to which m.P( ) could be bound must be inlined at classes, and classes within files. All kinds of spatial locality 

the call site. can provide useful clues to a reverse engineer. Therefore, 

6.3.2 Interleave Methods whenever possible, we randomize the placement of any item 
The detection of interleaved code is an important and in the source application. For some types of items (methods 

difficult reverse engineering task. 65 within classes, for example) this is trivial. In other cases 

FIG. 18 shows how we can interleave two methods (such as statements within basic blocks) a data dependency 

declared in the same class. The idea is to merge the bodies analysis (see David F. Bacon, Susan L. Graham, and Oliver 
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J. Sharp. Compiler transformations for high-performance 
computing. ACM Computing Surveys, 26(4): 345-420, 
December 1994. http:// www.acm.org/pubs/toc/Abstracts/ 
0360-0300/ 197406.html. and Michael Wolfe. High Perfor- 
mance Compilers For Parallel Computing. Addison-Wesley, 
1996. ISBN 0-8053-2730-4, incorporated herein by 
reference) is performed to determine which reorde rings are 
technically valid. 



{ 


- T => { 


int i-1; 


int 


while (i < 1000) 


while (i < 8003) 


...A(i]...; 


..A[(i-3)/8]...; 


i++; 


i+-S; 


} 


} 



These transformations have low potency (they do not add 

much obscurity to the program), but their resilience is high, 10 Obviously, overflow (and, in case of floating point 

in many cases one-way. For example, when the placement of variables, accuracy) issues need to be addressed. We could 

statements within a basic block has been randomized, there °f CT de,ere ™ e < hat ^caus e of the range of the variable 

will be no traces of the original order left in the resulting <- th ? can "* ; detem » ned "™g stat > c aDa Jy s * techniques 

, ° or by querying the user) in question no overflow will occur, 

15 or we co^ld change to a larger variable type. 

Ordering transformations can be particularly useful com- There will be a trade-off between resilience and potency 

p anions to the "Inline-Outline" transformation of Section on one hand, and cost on the other. A simple encoding 

6.3.1. The potency of that transformation can be enhanced function such as io-Cj+i+c^ in the example above, will add 

by (1) inlining several procedure calls in a procedure P, (2) little extra execution time but can be deobfuscated using 

randomizing the order of the statements in P, and (3) common compiler analysis techniques (Michael Wolfe, 

outlining contiguous sections of P's statements. This way, 20 High Performance Compilers For Parallel Computing, 

unrelated statements that were previously part of several Addison-Wesley, 1996. ISBN 0-8053-2730-4. and David F. 

different procedures are brought together into bogus proce- Bacon, Susan L. Graham, and Oliver J. Sharp. Compiler 

dural abstractions. transformations for high-performance computing. ACM 

' . . , . L1 , r Computing Surveys, 26(4):345-420, December 1994. http:// 

In certain cases it is also possible to reorder loops, for ^ www . acm . O rg/pubs/toc/Abstracts/0360-0300/ 

example by running them backwards. Such loop reversal 197406.html). 

transformations are common in high-performance compilers 7.1 2 Promote Variables 

(David F Bacon, Susan L. Graham, and Oliver J. Sharp. There are a number of simple storage transformations that 

Compiler transformations for high-performance computing. promote variables from a specialized storage class to a more 

ACM Computing Surveys, 26(4):345^i20, December 1994. 30 general class. Their potency and resilience are generally low, 

http:// www.acm.org/pubs/toc/Abstracts/0360-0300/ but used in conjunction with other transformations they can 

197406.html.). be quite effective. For example, in Java, an integer variable 

can be promoted to an integer object. The same is true of the 

7 Data Transformations other scalar types which all have corresponding "packaged" 

35 classes. Because Java™ supports garbage collection, the 

In this section we will discuss transformations that objects will be automatically removed when they are no 

obscure the data structures used in the source application. As longer referenced. Here is an example: 
indicated in FIG. 2e, we classify these transformations as 
affecting the storage, encoding, aggregation, or ordering of 

the data. _ 



40 * { - 

7.1 Storage and Encoding Transformations while ^i < 9) = T => whUc (i.^^ 9)' 

. . .A[i] . . . . .A|L value] . . .; 

In many cases there is a "natural" way to store a particular i ++; i.value++; 

data item in a program. For example, to iterate through the } } 

elements of an array we probably would choose to allocate 45 

a local integer variable of the appropriate size as the iteration It is als0 possi51c to change thc lifetimc of a variable> 

variable. Other vanable types might be possible, but they simplest such transform turns a local variable into a global 

would be less natural and probably less efficient. one which is then shared between independent procedure 

Furthermore, there is also often a "natural" interpretation invocations. For example, if procedures P and Q both 

of the bit-patterns that a particular variable can hold which reference a local integer variable, and P and Q cannot both 

is based on the type of the variable. For example, we would be active at me samc time (unless the program contains 

normally assume that a 16-bil integer variable storing the threads, this can be determining by examining the static call 

bit-pattern 0000000000001100 would represent the integer 5 ra P h ) then tne variable can be made global and shared 

value 12. Of course, these are mere conventions and other $5 between them: 
interpretations are possible. 

Obfuscating storage transformations attempt to choose 

unnatural storage classes for dynamic as well as static data. void P( ) { int c, 

Similarly, encoding transformations attempt to choose int I; . . .1. . . void P( ) 

unnatural encodings for common data types. Storage and 60 ^ ^ ,c * ' ' 

encoding transformations often go hand-in- hand, but they vo id q( j { .t b> whiIe ^ va i ue< 9) 

can sometimes be used in isolation. int k;. . .k. . . . . .c. . . 

7.1.1 Change Encoding ) | 

As a simple example of an encoding transformation we 

will replace an integer variable i by i 0 es c 1 *i+c 2 , where c 1 and 6S This transformation increases the u 5 metric, because the 

Cj are constants. For efficiency, we could choose Cj to be a number of global data structures referenced by P and Q is 

power of two. In the example below, we let c^S and c 2 =3: increased. 
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7.1 .3 Split Variables The intent of that obfuscation is to convert a section of Java 
Boolean variables and other variables of restricted range bytecode into code for another virtual machine. The new 

can be split into two or more variables. We will write a code will typically be stored as static string data in the 

variable V split into k variables pj, . . . , p* as V=[p lf . . . , obfuscated program. For even higher levels of potency and 

P*>]- Typically, the potency of this transformation will grow 5 resilience, however, the strings could be converted to pro- 

with k. Unfortunately, so will the cost of the transformation, grams that produce them, as discussed above, 

so we usually restrict k to 2 or 3. _ _ A _ 

To allow a variable V of type T to be split into two 7.2 Aggregation Transformations 
variables p and q of type U requires us to provide three In contrast to imperative and functional languages, object- 
pieces of information: (1) a function f(p; q) that maps the 10 oriented languages are more data-oriented than control- 
values of p and q into the corresponding value of V, (2) a oriented. In other words, in an object-oriented program, the 
function g(V) that maps the value of V into the correspond- control is organized around the data structures, rather than 
ing values of p and q, and (3) new operations (corresponding the other way around. This means that an important part of 
to the primitive operations on values of type T) cast in terms reverse -engineering an object-oriented application is trying 
of operations on p and q. In the remainder of this section we 15 to restore the program's data structures. Conversely, it is 
will assume that V is of type Boolean, and p and q are small important for an obfuscator to try to hide these data struc- 
integer variables. tures. 

FIG. 21a shows a possible choice of representation for In most object-oriented languages, there are just two ways 

split Boolean variables. The table indicates that if V has been to aggregate data: in arrays and in objects. In the next three 

split into p and q, and if, at some point in the program, p»q=0 20 sections we will examine ways in which these data struc- 

or p=q=l, then that corresponds to V being False. Similarly, tures can be obfuscated. 

p=0, q=l or p=l, q=0 corresponds to True. 7.2.1 Merge Scalar Variables 

Given this new representation, we have to devise substi- Two or more scalar variables W 1 ... V*can be merged into 

tutions for various built-in Boolean operations (e.g., &, or). one variable V M , provided the combined ranges of V 1 . . . V^ 

One approach is to provide a run-time lookup table for each 25 will fit within the precision of V M . For example, two 32-bit 

operator. Tables for "AND" and "OR" are shown in FIGS. integer variables could be merged into one 64-bit variable. 

21c and 21c/, respectively. Given two Boolean variables Arithmetic on the individual variables would be transformed 

VaKp.qJandV^-fr.s], V x & V 2 is computed as AND[2p+q, into arithmetic on V^. As a simple example, consider 

2r+s]. merging two 32-bit integer variables X and Y into a 64-bit 

In FIG. 21e, we show the result of splitting three Boolean 30 variable Z. Using the merging formula, 

variables A-[al,a2], B-[bl,b2], and C-[cl,c2]. An interest- yui^y+x 

ing aspect of our chosen representation is that there are ' 

several possible ways to compute the same Boolean expres- we get the arithmetic identities in FIG. 23a. Some simple 

sion. Statements (3') and (4') in FIG. 21e, for example, look examples are given in FIG. 236. 

different, although they both assign False to a variable. 35 In particular, FIG. 23 shows merging two 32-bit variables 

Similarly, while statements (5') and (6') are completely X and Y into one 64-bit variable Z. Y occupies the top 32 bits 

different, they both compute A & B. of Z, X the bottom 32 bits. If the actual range of either X or 

The potency, resilience, and cost of this transformation all Y can be deduced from the program, less intuitive merges 

grow with the number of variables into which the original could be used. FIG. 23a gives rules for addition and mul- 

variable is split. The resilience can be further enhanced by 40 tiplication with X and Y. FIG. 236 shows some simple 

selecting the encoding at run-time. In other words, the examples. The example could be further obfuscated, for 

run-time look-up tables of FIGS, 216 through 21d are not example by merging (2') and (3*) into Z+=47244640261 . 

constructed at compile -time (which would make them sus- The resilience of variable merging is quite low. A deob- 

ceptible to static analyses) but by algorithms included in the fuscator only needs to examine the set of arithmetic opera- 

obftiscated application. This, of course, would prevent us 45 tions being applied to a particular variable in order to guess 

from using in-line code to compute primitive operations, as that it actually consists of two merged variables. We can 

done in statement (6') in FIG. 21e. increase the resilience by introducing bogus operations that 

7.1.4 Convert Static to Procedural Data could not correspond to any reasonable operations on the 
Static data, particularly character strings, contain much individual variables. 

useful pragmatic information to a reverse engineer. A tech- 50 In the example in FIG. 236, we could insert operations 

nique for obfuscating a static string is to convert it into a that appear to merge Z's two halves, for example, by 

program that produces the string. The program — which rotation: if (P* 7 ) Z=rotate(Z,5) 

could be a DFA or a Trie traversal — could possibly produce A variant of this transformation is to merge V 2 . . . V k into 

other strings as well. an array 

As an example, consider a function G of FIG. 22, which 55 Y^l . . . k 

is constructed to obfuscate the strings "AAA", "BAAAA", V 1 . , . 

and "CCB". The values produced by G are G(1)="AAA", of the appropriate type. If V 2 . . . V* are object reference 

G(2)="BAAAA", G(3)=G(5)="CCB", and G(4)="XCB" variables, for example, then the element type of VAcan be 

(which is not actually used in the program). For other any class that is higher in the inheritance hierarchy than any 

argument values, G may or may not terminate. 60 of the types of V 1 . . . V A . 

Aggregating the computation of all static string data into 7.2.2 Restructure Arrays 

just one function is, of course, highly undesirable. Much A number of transformations can be devised for obscuring 

higher potency and resilience is achieved if the G-function operations performed on arrays: for example, we can split an 

was broken up into smaller components that were embedded array into several sub-arrays, merge two or more arrays into 

into the "normal" control flow of the source program. 65 one array, fold an array (increasing the number of 

It is interesting to note that we can combine this technique dimensions), or flatten an array (decreasing the number of 

with the table interpretation transformation of Section 6.2.5. dimensions). 
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FIG. 24 shows some examples of array restructuring. In Stan C Kwasny and John F. Buck, editors, Proceedings of 

statements (1-2) an array A is split up into two sub-arrays Al the 21st Annual Conference on Computer Science, pages 

and A2. Al holds the elements of A that have even indices, 66-73, New York, N.Y, USA, February 1993. ACM Press, 

and A2 holds the elements with odd indices. ftp://st.cs.uiuc.edu/pub/papers/refactoring/refact oring- 

Statements (3-4) of FIG. 24 show how two integer arrays 5 superclasses.ps, incorporated herein by reference). Refac- 

B and C can be interleaved into a resulting array BC. The toring is a two-step process. First, it is detected that two, 

elements from B and C are evenly spread over the resulting apparently independent classes, in fact implement similar 

array. behavior. Secondly, features common to both classes are 

Statements (6-7) demonstrate how a one-dimensional moved into a new (possibly abstract) parent class. False 

array D can be folded into a two-dimensional array Dl. 10 refactoring is a similar operation, only it is performed on two 

Statements (8-9), finally, demonstrate the reverse transfer- classes C J and C 2 that have no common behavior. If both 

mation: a two-dimensional array E is flattened into a one- classes have instance variables of the same type, these can 

dimensional array El. be moved into the new parent class C 3 . C 3 's methods can be 

Array splitting and folding increase the u 6 data complex- buggy versions of some of the methods from C 3 and C 2 . 

ity metric. Array merging and flattening, on the other hand, 15 

seem to decrease this measure. While this may seem to 7.3 Ordering Transformations 

indicate that these transformations have only marginal or Iq Section M ^ showed ^ (when ib]e) random _ 

even negative potency, this, in fact, ^ deceptive The prob- izi ^ 0fder m which ^ £ perfo ; med fe a 

lem is that the complexity metrics of FIG. 7 fail to capture ^ obfuscation similar , \ { fc useful to F randomize the 

an important aspect of some data structure transformations: 20 Qrder of dechrations in me ^ ap pi ica tion. 

they introduce structure where there was originally none or 

they remove structure from the original program. This can . Particularly, we randomize the order of methods and 

greatly increase the obscurity of the program. For example, instance variables within classes and formal parameters 

a programmer who declares a two-dimensional array does so within methods. In the latter case, the corresponding actuals 

for a purpose: the chosen structure somehow maps cleanly 25 wJ1 of course have t0 be ^ordered as well. The potency of 

to the data that is being manipulated. If that array is folded these transformations is low and the resilience is one-way. 

into a one-dimensional structure, a reverse engineer will In many cases it will also be possible to reorder the 

have been deprived of much valuable pragmatic informa- elements within an array. Simply put, we provide an opaque 

tion. encoding function f(i) which maps the i:th element in the 

7.2.3 Modify Inheritance Relations 30 original array into its new position of the reordered array: 

In current object-oriented language such as the Java™ 
language, the main modularization and abstraction concept 
is the class. Classes are essentially abstract data types that 
encapsulate data (instance variables) and control (methods). 

We write a class as C-(V, M), where V is the set of C's 35 
instance variables and M its methods. 

In contrast to the traditional notion of abstract data types, 
two classes Cj and C 2 can be composed by aggregation (C 2 
has an instance variable of type C x ) as well as by inheritance 

(C 2 extends C, by adding new methods and instance 40 

variables). We write inheritance as C.-C, U C a . C 2 is said » OPAQUE VALUES AND PREDICATES 

to inherit from C 19 its super- or parent class. The U operator M we have opaque predicates are the ma j or build . 

is the function that combines the parent class with the new ing block in the desigQ of transformations that obfuscate 

properties defined in C 2 . The exact semantics of U depends con trol flow. In fact, the quality of most control transfor- 

on the particular programming language. In languages such 45 malions ^ directly dependent on the quality of such predi- 

as Java, U is usually interpreted as union when applied to the cates. 

instance variables and as overriding when applied to meth- , „ , „ . - . . 

ods In Section 6.1 we gave examples of simple opaque 

According to metric U 7 , the complexity of a class C a Plates witb trivial and weak resilience. This means that 

grows with its depth (distance from The root) in the inner- 50 < he 0 f M J ue P«dicates car. be broken (an automatic deob- 

itance hierarchy and the number of its direct descendants. £ ^ ator ™* ^"l 11 " 6 , ,tl6lr Vahl6) T 8 0r 8 I 

For example, there are two basic ways in which we can aDa !>', sls - Obviously we generally require a much 

t £. j - t Yt (f i ) \ higher resistance to attack. Ideally, we would like to be able 

shownta FiaiSa ITiLZVnwM&s^Lsl show* in t0 c ° n ? , ( mcl TT ^r'.K ^ requi ' e , W ° rSt IT!^' 

FIG 25b 55 nentia * y n tne size °* tne P ro 6 ram ) 10 break but only 

Aproblem with class factoring is its low resilience; there f^™™*} time to construct. In this section we will present 

f u . j * * *i ■ 4L two such techniques. The first one is based on aliasing and 

is nothing stopping a deobfuscator from simply merging the iL . . . » «. . , . . & 

c * « i r-p ... f ( . r i. the second is based on lightweight processes, 

factored classes. To prevent this, factoring and insertion are * & r 

normally combined as shown in FIG. 25d. Another way of 8.1 Opaque Constructs Using Objects and Aliases 

increasing the resilience of these types of transformations is 60 

to make sure that new objects are created of all introduced Inter-procedural static analysis is significantly compli- 

classes. cated whenever there is a possibility of aliasing. In fact, 

FIG. 25c shows a variant of class insertion, called false precise, flow-sensitive alias analysis is undecidable in lan- 

refactoring. Refactoring is a (sometimes automatic) tech- guages with dynamic allocation, loops, and if-statements. 

nique for restructuring object-oriented programs whose 65 In this section we will exploit the difficulty of alias 

structure has deteriorated (see William F. Opdyke and Ralph analysis to construct opaque predicates that are cheap and 

E. Johnson. Creating abstract superclasses by refactoring. In resilient to automatic deobfuscation attacks. 



{ 


{ 


int A(1000J 


int i=l,A(1000]; 


while (i < 1000) 


- T -> while (i < 1000) 


. . A[i] . . .; 


. ■ -A[f(i)]. . .; 


i++; 


i++; 


} 


} 
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8.2 Opaque Constructs Using Threads 

Parallel programs are more difficult to analyze statically 
than their sequential counterparts. The reason is their inter- 
leaving semantics: n statements in a parallel region PAR S,, 
S 2 , . . . S„, ENDPAR can be executed in n! different ways. 
In spite of this, some static analyses over parallel programs 
can be performed in polynomial time [18], while others 
require all n! interleavings to be considered. 

In Java, parallel regions are constructed using lightweight 
processes known as threads. Java threads have (from our 
point of view) two very useful properties: (1) their sched- 
uling policy is not specified strictly by the language speci- 
fication and will hence depend on the implementation, and 
(2) the actual scheduling of a thread will depend on asyn- 
chronous events, such as generated by user interaction, and 
network traffic. Combined with the inherent interleaving 
semantics of parallel regions, this means that threads are 
very difficult to analyze statically. 

We will use these observations to create opaque predi- 
cates (see FIG. 32) that will require worst-case exponential 
time to break. The basic idea is very similar to the one used 
in Section 8.2: a global data structure V is created and 
occasionally updated, but kept in a state such that opaque 
queries can be made. The difference is that V is updated by 
concurrently executing threads. 

Obviously, V can be a dynamic data structure such as the 
one created in FIG. 26. The threads would randomly move 
the global pointers g and h around in their respective 
components, by asynchronously executing calls to move and 
insert. This has the advantage of combining data races with 
interleaving and aliasing effects, for very high degrees of 
resilience. 

In FIG. 27, we illustrate these ideas with a much simpler 
example where V is a pair of global integer variables X and 
Y. It is based on the well-known fact from elementary 
number theory that, for any integers x and y, 7y 2 -l does not 
equal x 2 . 

9 DEOBFUS CATION AND PREVENTIVE 
TRANSFORMATIONS 

Many of our obfuscating transformations (particularly the 
control transformations of Section 6.2) can be said to embed 
a bogus program within a real program. In other words, an 
obfuscated application really consists of two programs 
merged into one: a real program which performs a useful 
task and a bogus program which computes useless informa- 
tion. The sole purpose of the bogus program is to confuse 
potential reverse engineers by hiding the real program 
behind irrelevant code. 

The opaque predicate is the main device the obfuscator 
has at its disposal to prevent the bogus inner program from 
being easily identified and removed. For example, in FIG. 
28a, an obfuscator embeds bogus code protected by opaque 
predicates within three statements of a real program. A 
deobfuscator' s task is to examine the obfuscated application 
and automatically identify and remove the inner bogus 
program. To accomplish this, the deobfuscator must first 
identify and then evaluate opaque constructs. This process is 
illustrated in FIGS. 28b through 28d 

FIG. 29 shows the anatomy of a semi-automatic deob- 
fuscation tool. It incorporates a number of techniques that 
are well known in the reverse engineering community. In the 
remainder of this section we will briefly review some of 
these techniques and discuss various counter-measures (so 
called preventive transformations) that an obfuscator can 
employ to make deobfuscation more difficult. 
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9.1 Preventive Transformations 

Preventive transformations, which are discussed above 
with respect to FIG. 2g t are quite different in flavor from 
control or data transformations. In contrast to these, their 
main goal is not to obscure the program to a human reader. 
Rather, they are designed to make known automatic deob- 
fuscation techniques more difficult (inherent preventive 
transformations), or to explore known problems in current 
deobfuscators or decompilers (targeted preventive 
transformations). 

9.1.1 Inherent Preventive Transformations 

Inherent preventive transformations will generally have 
low potency and high resilience. 

Most importantly, they will have the ability to boost the 
resilience of other transformations. As an example, assume 
that we have reordered a for-loop to run backwards, as 
suggested in section 6.4. We were able to apply this trans- 
formation only because we could determine that the loop 
had no loop-carried data dependencies. Naturally, there is 
nothing stopping a deobfuscator from performing the same 
analysis and then returning the loop to forward execution, lb 
prevent this, we can add a bogus data dependency to the 
reversed loop: 



30 



{ 


{ 


for(i-l;i<-10;iH 


h+) - T -> iat B[50]; 


A[i]-i 


for(i=10;i<=l; i --) 


} 


A(i>i; 




B[i]+-B [i*i/2] 




} 



The resilience this inherent preventive transformation 
35 adds to the loop reordering transformation depends on the 
complexity of the bogus dependency and the state-of-the-art 
in dependency analysis [36], 
9.1.2 Targeted Preventive Transformations 

As an example of a targeted preventive transformation, 
consider the HoseMocha program (Mark D. LaDue. 
HoseMocha. http://www.xynyx. demon.nl/java/ 
HoseMocha Java, January 1997). It was designed specifi- 
cally to explore a weakness in the Mocha (Hans Peter Van 
Vliet. Mocha — The Java decompiler, http://web.inter.nl.net/ 
45 users/H. P.van.Vliet/mocha.html, January 1996) decompiler, 
HoseMocha inserts extra instructions after every return- 
statement in every method in the source program. This 
transformation has no effect on the behavior of the 
application, but it is enough to make Mocha crash. 

50 9.2 Identifying and Evaluating Opaque Constructs 

The most difficult part of deobfuscation is identifying and 
evaluating opaque constructs. Note that identification and 
evaluation are distinct activities. An opaque construct can be 

55 local (contained within a single basic block), global 
(contained within a single procedure), or inter-procedural 
(distributed throughout the entire program). For example, if 
(x*x==(7 F *y*y-l)) is a local opaque predicate, whereas 
R=X*X; . . , ; S=7*y*y-1; . . . ; if (R—S^ ... is global. If 

60 the computation of R and S were performed in different 
procedures, the construct would be inter-procedural. 
Obviously, identification of a local opaque predicate is easier 
than identification of an inter-procedural one. 

9.3 Identification by Pattern Matching 

65 

A deobfuscator can use knowledge of the strategies 
employed by known obfuscators to identify opaque predi- 
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cates. A designer of a deobfuscator could examine an 
obfuscator (either by decompiling it or simply by examining 
the obfuscated code it generates) and construct pattern- 
matching rules that can identify commonly used opaque 



9.4 Identification by Program Slicing 

A programmer will generally find the obfuscated version 
of a program more difficult to understand and reverse 
engineer than the original one. The main reasons are that in 



deobfuscation method that examines the run- time charac- 
teristics of an obfuscated application in this way, Statistical 
Analysis. The deobfuscator would alert the reverse engineer 
to any predicate that always returns the same truth value 
predicates. This method will work best for simple local 5 over a i arge number of test runs, because they may turn out 
predicates, such as x*x«(7*y*y-l) or random (P",5)<0 t0 be an opaque ? r predicate ^ deobfuscator could 
To thwart attempts at pattern matching, the obfuscator n ot blindly replace such predicates with True (False), 
should avoid using canned opaque constructs. It is also because this would be too dangerous. Many applications 
important to choose opaque constructs that are syntactically win conlain « rcal „ predicatcs that check for conditions that 
similar to the constructs used in the real application. 1Q omy nappcn ^ exceptional circumstances, and to the 

deobfuscator they will appear to behave identically to an 
opaque predicate. As an example, consider pif 
(Leap Year) .... 
Statistical analysis can also be used for evaluation. When 
the obfuscated program (a) live "real" code will be inter- 15 a potential opaque predicate (e.g., P 7 ) in a program M has 
spersed with dead bogus code, and (b) logically related been identified, we guess its value (True), and make a 
pieces of code will have been broken up and dispersed over version M' of the obfuscated program where the opaque 
the program. Program slicing tools can be used by a reverse predicate has been replaced by the guessed value. We then 
engineer to counter these obfuscations. Such tools can run M and M 1 in parallel on the same input, and compare to 
interactively aid the engineer to decompose a program into 20 see that they produce identical output. If the outputs are the 
manageable chunks called slices. A slice of a program P with same, we can conclude that the predicate was part of the 
respect to a point p and a variable v consists of all the bogus, not the real, application, as shown in FIG. 30. 
statements of P that could have contributed to v's value at p. We have to makc surc that our chosen in ts adcquately 
Hence, a program sheer would be able to extract from the coyer all ths [n me program . Agaill) if mc m 
obfuscated program the statements of the algorithm that 25 
computes an opaque variable v, even if the obfuscator has 
dispersed these statements over the entire program. 

There are several strategies available to an obfuscator to 
make slicing a less useful identification tool: Add parameter 
aliases A parameter alias is two formal parameters (or a 30 
formal parameter and a global variable) that refer to the 



same memory location. The cost of precise inter-procedural 
slicing grows with the number of potential aliases in a 
program, which in turn grows exponentially with the num- 
ber of formal parameters. Hence, if the obfuscator adds 
aliased dummy parameters to a program it will either 
substantially slow down the slice r (if precise slices are 
required), or force the sheer to produce imprecise slices (if 
fast slicing is required) 



contains paths that are rarely taken (if (Leap Year) . . . ) this 
will be difficult. Furthermore, generating large numbers of 
correct input/output data is very difficult, particularly when 
internal structure of the application is unknown, or the input 
is entered (as is often the case with Java programs) through 
a complex graphical user interface. 

To prevent identification by statistical analysis, the obfus- 
cator may choose to favor transformations that insert P ? 
predicates (such as the one shown in FIG. lib) over those 
that insert P 7 or P^ predicates. 

Another possible counter-measure against statistical 
analysis is to design opaque predicates in such a way that 
several predicates have to be cracked at the same time. One 
way of doing this is to let the opaque predicates have 



AJJ uij j i i_4n side-effects. In the example below the obfuscator has deter- 

Add variable dependencies, as popular slicing tools such w , , r < c . 4 . „ . . . . 

TT . /T n ¥ i r* i n Ti7 n t n mined (through some sort of static now analysis) that 
as Unravel (James R. Lyle, Dolorres R. Wallace, James R 



Graham, Keith B. Gallagher, Joseph P. Poole, and David W 
Binkley. Unravel: A CASE tool to assist evaluation of high 
integrity software. Volume 1: Requirements and design. 
Technical Report NIS-TIR 5691, U.S. Department of 45 
Commerce, August 1995) work well for small slices, but 
will sometimes require excessive time to compute larger 
ones. For example, when working on a 4000 line C program, 
Unravel in some cases required over 30 minutes to compute 
a slice. To force this behavior, the obfuscator should attempt 
to increase slice sizes, by adding bogus variable dependen- 
cies. In the example below, we have increased the size of the 
slice computing x by adding two statements which appar- 
ently contribute to x's value, but which, in fact, do not. 

55 



(through some sort of static flow analysis) 
statements S 2 and S 2 must always execute the same number 
of times. The statements are obfuscated by introducing 
opaque predicates which are calls to functions Q a and Q x . Q 1 
and Q 2 increment and decrement a global variable k: 
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{ 

Si; 
s 2 ; 
} 



_T_. 



)} 



{ 

tnt k-O; 
bool Q t (x) { 
k+-2 31 ; return (p 
bool Q 2 (x) { 
k— 2 31 ; return (P T j) } 

if (Q*G)T)S i: 

if (Q 2 (k)T)S 2 ; 
} 



maln( ) 
int x-1; 



mainf ) { 
int x-1; 

if (P F ) x++; 
x - x + V0 ; 
x = x • 3; 
} 



If the deobfuscator tries to replace one (but not both) 
eo predicates with True, k will overflow. As a result, the 
deobfiiscated program will terminate with an error condi- 
tion. 



9.6 Evaluation by Data-Flow Analysis 
9.5 Statistical Analysis 65 Deobfuscation is similar to many types of code optimi- 

A deobfuscator can instrument an obfuscated program to zation. Removing if (False) ... is dead code elimination and 
analyze the outcome of all predicates. We will call any moving identical code from if-statement branches (e.g., Sj 



05/25/2004, EAST Version: 1.4.1 



US 6,668 : 

31 

and Sq 1 in FIG. 28) is code hoisting, both common code 
optimization techniques, 

When an opaque construct has been identified we can 
attempt to evaluate it. In simple cases constant propagation 5 
using a reaching definition data-flow analysis can be suffi- 
cient: x=5; . . . ; y=7; . . . ; if (x*x«(7*y*y-l)) 

9.7 Evaluation by Theorem Proving 

10 

If data-flow analysis is not powerful enough to break the 
opaque predicate, a deobfuscator can attempt to use a 
theorem prover. Whether this is doable or not depends on the 
power of state-of-the-art theorem provers (which is difficult 
to ascertain) and the complexity of the theorem that needs to 15 
be proven. Certainly, theorems that can be proved by induc- 
tion (such as x 2 (x+l)-0 (mod 4)), are well within reach of 
current theorem provers. 

To make things more difficult, we can use theorems which 20 
are know to be difficult to prove, or for which no known 
proof exists. In the example below the deobfuscator will 
have to prove that the bogus loop always terminates in order 
to determine that S 2 is live code: 



{ - T -> { 

S 2 ; n - random(l,2 32 ); 

} do 30 

n = ((n%2)!=0)?3*n+l:n/2 
while (n>l); 

} 



This is known as the Collatz problem. A conjecture says 
that the loop will always terminate. Although there is no 
known proof of this conjecture; the code is known to 
terminate for all numbers up to 7*10". Thus, this obfusca- 
tion is safe (the original and obfuscated code behave 40 
identically), but is difficult to deobfuscate. 

9.8 Deobfuscation and Partial Evaluation 

45 

Deobfuscation also resembles partial evaluation. A partial 
evaluator splits a program into two parts: the static part 
which can be precomputed by the partial evaluator, and the 
dynamic part which is executed at runtime. The dynamic 
part would correspond to our original, unobfuscated, pro- 50 
gram. The static part would correspond to our bogus inner 
program, which, if it were identified, could be evaluated and 
removed at deobfuscation time. 

Like all other static inter-procedural analysis methods, 55 
partial evaluation is sensitive to aliasing. Hence, the same 
preventive transformations that were discussed in relation to 
slicing also applies to partial evaluation. 

10 Obfuscation Algorithms 60 

Given the obfuscator architecture of Section 3, the defi- 
nition of obfuscation quality in Section 5, and the discussion 
of various obfuscating transformations in Section 6 through 
Section 9, we are now in a position to present more detailed 65 
algorithms, in accordance with one embodiment of the 
present invention. 



,325 Bl 

32 

The top-level loop of an obfuscation tool can have this 
general structure: 

WHILE NOT Done (A) DO 

S:-SelectCode(A); 

T:-SelectTransform(S); 

A:-Apply(T,S); 
END; 

SelectCode returns the next source code object to be 
obfuscated. SelectTransform returns the transformation 
which should be used to obfuscate the particular source code 
object. Apply applies the transformation to the source code 
object and updates the application accordingly. Done deter- 
mines when the required level of obfuscation has been 
attained. The complexity of these functions will depend on 
the sophistication of the obfuscation tool. At the simplistic 
end of the scale, SelectCode and SelectTransform could 
simply return random source code object/transformations, 
and Done could terminate the loop when the size of the 
application exceeds a certain limit. Normally, such behavior 
is insufficient. 

Algorithm 1 gives a description of a code obfuscation tool 
with a much more sophisticated selection and termination 
behavior. In one embodiment, the algorithm makes use of 
several data structures, which are constructed by Algorithms 
5, 6, and 7: 

? s For each source code object S, P S (S) is the set of 
language constructs the programmer used in S. P^(S) is 
used to find appropriate obfuscating transformations 
for S. 

A For each source code object S, A(S)»{T J -»V 1 ; . . . ; 
T n -»V„} is a mapping from transformations T,. to 
values V ( -, describing how appropriate it would be to 
apply T, to S. The idea is that certain transformations 
may be inappropriate for a particular source code object 
S, because they introduce new code which is "unnatu- 
ral" to S. The new code would look out of place in S 
and hence would be easy to spot for a reverse engineer. 
The higher the appropriateness value V ( . the better the 
code introduced by transformation T t will fit in. 

For each source code object S, I(S) is the obfuscation 
priority of S. I(S) describes how important it is to 
obfuscate the contents of S. If S contains an important 
trade secret then I(S) will be high, if it contains mainly 
"bread-and-butter" code I(S) will be low. 

R For each routine M, R(M) is the execution time rank of • 
M.R(M)=1 if more time is spent executing M than any 
other routine. 

The primary input to Algorithm 1 is an application A and 
a set of obfuscating transformations {T a ; T 2 ; . . . }. The 
algorithm also requires information regarding each 
transformation, particularly three quality functions T rej (S), 
Tp^/S), and T C(W (S) (similar to their namesakes in Section 
5, but returning numerical values) and a function P f : 
T rej (S) returns a measure of the resilience of transforma- 
tion T when applied to source code object S (i.e., how 
well T will withstand an attack from an automatic 
deobfuscator). 

Tpo/S) returns a measure of the potency of transformation 
Twhen applied to source code object S (i.e., how much 
more difficult S will be for a human to understand after 
having been obfuscated by T). 

T caif (S) returns a measure of the execution time and space 
penalty added by T to S. 

P r maps each transformation T to the set of language 
constructs that T will add to the application. 
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Points 1 to 3 of Algorithm 1 load the application to be 
obfuscated, and builds appropriate internal data structures. 
Point 4 builds P^S), A(S), I(S), and R(M). Point 5 applies 
obfuscating transformations until the required obfuscation 
level has been attained or until the maximum execution time 
penalty is exceeded. Point 6, finally, rewrites the new 
application A'. 



Algorithm 1 (Code Obfuscation) 



10 



input: 



a) An application A made up of source code or object 
code files CI; C2; . . . 

b) The standard libraries LI; L2; . . , defined by the 
language. 

c) A set of obfuscating transformations {Tl; T2; . . . }. 

d) A mapping Pt which, for each transformation T gives 
the set of language constructs that T will add to the 
application. 

e) Three functions T re /S), T^S), T cos £S) expressing 
the quality of a transformation T with respect to a 
source code object S. 

f) A set of input data I={U; 12; . . . } to A. 

g) Two numeric values AcceptCost>0 and ReqObf>0. 
AcceptCost is a measure of the maximum extra 
execution time/space penalty the user will accept. 
ReqObf is a measure of the amount of obfuscation 
required by the user. 

output: An obfuscated application A 1 made up of source 
code or object code files. 

1. Load the application C,; C 2 ; . . . to be obfuscated. The 
obfuscator could either 

(a) load source code files, in which case the obfuscator 
would have to contain a complete compiler front-end 
performing lexical, syntactic, and semantic analysis, (a 
less powerful obfuscator that restricts itself to purely 
syntactic transformation could manage without seman- 
tic analysis) or 

(b) load object code files. If the object code retains most 
or all of the information in the source code (as is the 
case with Java class files), this method is preferable. 

2. Load library code files LI; L2; . . . referenced directly or 
indirectly by the application. 

3. Build an internal representation of the application. The 
choice of internal representation depends on the structure 
of the source language and the complexity of the trans- 
formations the obfuscator implements. A typical set of 
data structures might include: 

(a) A control-flow graph for each routine in A. 

(b) A call-graph for the routines in A. 

(c) An inheritance graph for the classes in A 

4. Construct mappings R(M) and P,(S) (using Algorithm 5), 
I(S) (using Algorithm 6), and A(S) (using Algorithm 7). 

5. Apply the obfuscating transformations to the application. 
At each step we select a source code object S to be 
obfuscated and a suitable transformation T to apply to S. 
The process terminates when the required obfuscation 
level has been reached or the acceptable execution time 
cost has been exceeded. 

REPEAT 

S: SelectCode(I); 

T: SelectTransform(S, A); 

Apply T to S and update relevant data structures from 
point 3; 
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UNTIL Done(ReqObf, AcceptCost, S, T, I) 
6. Reconstitute the obfuscated source code objects into a 
new obfuscated application, A'. 

Algorithm 2 (SelectCode) 

input: The obfuscation priority mapping I as computed by 

Algorithm 6. 
output: A source code object S. 

I maps each source code object S to I(S), which is a 
measure of how important it is to obfuscate S. To select the 
next source code object to obfuscate, we can treat I as a 
priority queue. In other words, we select S so that I(S) is 
maximized. 

Algorithm 3 (SelectTransform) 

input: 

a) A source code object S. 

b) The appropriateness mapping A as computed by 
Algorithm 7, 

output: A transformation T 

Any number of heuristics can be used to select the most 
suitable transformation to apply to a particular source code 
object S. However, there are two important issues to con- 
sider. Firstly, the chosen transformation must blend in natu- 
rally with the rest of the code in S. This can be handled by 
favoring transformations with a high appropriateness value 
in A(S). Secondly, we want to favor transformations which 
yield a high 'bang-for-the-buck' (i.e. high levels of obfus- 
cation with low execution time penalty). This is accom- 
plished by selecting transformations that maximize potency 
and resilience, and minimize cost. These heuristics are 
captured by the following code, where wl, w2, w3 are 
implementation-defined constants: 

Return a transform T, such that T-*V is within A(S), and 
(wl'T^+wl'T^CSHwS'Vyi^/S) is maxi- 
mized. 
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Algorithm 4 (Done) 



45 



input: 

a) ReqObf, the remaining level of obfuscation. 

b) AcceptCost, the remaining acceptable execution 
time penalty. 

c) A source code object S. 

d) A transformation T 

e) The obfuscation priority mapping I. 
output: 

a) An updated ReqObf. 

b) An updated AcceptCost. 

c) An updated obfuscation priority mapping I. 

d) A Boolean return value which is TRUE if the 
termination condition has been reached. 

The Done function serves two purposes. It updates the 
priority queue I to reflect the fact that the source code object 
S has been obfuscated, and should receive a reduced priority 
value. The reduction is based on a combination of the 
resilience and potency of the transformation. Done also 
updates Reqobf and AcceptCost, and determines whether the 
termination condition has been reached. w x , w 2 , w 3 , w 4 are 
implementation-defined constants: 
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I(S):=I(S)-(w 2 T p(M (S) + w 2 T r ^S)); 
Req0bf:=Req0bf-(w 2 T^(S)+w 2 T w (S)); 
AcceptCost: oAcceptCost-T C£W X s ); 
RETURN AcceptCost<=0 OR ReqObf<=0. 

Algorithm 5 (Pragmatic Information) 

input: 

a) An application A. 

b) A set of input data I={I1; 12, . . . } to A. 
output: 

a) A mapping R(M) which, for every routine M in A, 
gives the execution time rank of M 

b) A mapping P s (S), which, for every source code 
object S in A, gives the set of language constructs 
used in S. 

Compute pragmatic information. This information will be 
used to choose the right type of transformation for each 
particular source code object. 

1. Compute dynamic pragmatic information (i.e., run the 
application under a profiler on the input data set I pro- 
vided by the user. Compute R(M) (the execution time rank 
of M) for each routine/basic block, indicating where the 
application spends most of its time. 

2. Compute static pragmatic information P S (S)- P*(S) P ro " 
vides statistics on the kinds of language constructs the 
programmer used in S. 

FOR S:=each source code object in A DO 

0:=The set of operators that S uses; 

C:=The set of high-level language constructs (WHILE 
statements, exceptions, threads, etc.) that S uses; 

L:=The set of library classes/routines that S references; 

Ps(S):=0 UCUL; 
END FOR. 

Algorithm 6 (Obfuscation Priority) 

input: 

a) An application A. 

b) R(M), the rank of M. 

output: A mapping I(S) which, for each source code object 
S in A, gives the obfuscation priority of S. 

I(S) can be provided explicitly by the user, or it can be 
computed using a heuristic based on the statistical data 
gathered in Algorithm 5. Possible heuristics might be: 

1 . For any routine M in A, let I(M) be inversely proportional 
to the rank of M, R(M). I.e. the idea is that "if much time 
is spent executing a routine M, then M is probably an 
important procedure that should be heavily obfuscated." 

2. Let I(S) be the complexity of S, as defined by one of the 
software complexity metrics in Table 1. Again, the 
(possibly flawed) intuition is that complex code is more 
likely to contain important trade secrets than simple code. 

Algorithm 7 (Obfuscation Appropriateness) 

input: 

a) An application A. 

b) A mapping P t which, for each transformation T, 
gives the set of language constructs T will add to the 
application. 

c) A mapping Ps (S) which, for each source code object 
S in A, gives the set of language constructs used in 
S. 

output: A mapping A(S) which, for each source code 
object S in A and each transformation T, gives the 
appropriateness of T with respect to S. 
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Compute the appropriateness set A(S) for each source 
code object S. The mapping is based primarily on the static 
pragmatic information computed in Algorithm 5. 
FOR S:-each source code object in A DO 
5 FOR T each transformation DO 

V: -degree of similarity between 
PtOO and Ps(S); 
A(S);-A(S) U {T-V}; 
END FOR 
10 END FOR 

11 SUMMARY AND DISCUSSION 

We have observed that it may under many circumstances 

15 be acceptable for an obfuscated program to behave differ- 
ently than the original one. In particular, most of our 
obfuscating transformations make the target program slower 
or larger than the original. In special cases we even allow the 
target program to have different side-effects than the 

20 original, or not to terminate when the original program 
terminates with an error condition. Our only requirement is 
that the observable behavior (the behavior as experienced by 
a user) of the two programs should be identical. 

Allowing such weak equivalence between original and 

25 obfuscated program is a novel and very exciting idea. 
Although various transformations are provided and 
described above, many other transformations will be appar- 
ent to one of ordinary skill in the art and can be used to 
provide obfuscation for enhanced software security in accor- 

30 dance with the present invention. 

There is also great potential for much future research to 
identify transformations not yet known. In particular, we 
would like to see the following areas investigated: 
1. New obfuscating transformations should be identified. 

35 2. The interaction and ordering between different transfor- 
mations should be studied. 

This is similar to work in code optimization, where the 
ordering of a sequence of optimizing transformations has 
always been a difficult problem. 
40 3. The relationship between potency and cost should be 
studied. For a particular kind of code we would like to 
know which transformations would give the best "bang- 
for-the-buck" (i.e., the highest potency at the lowest 
execution overhead). 
45 For an overview of all the transformations that have been 
discussed above, see FIG. 31. For an overview of the opaque 
constructs that have been discussed above, see FIG. 32. 
However, the present invention should not be limited to the 
exemplary transformations and opaque constructs discussed 
50 above. 

11.1 The Power of Obfuscation 

Encryption and program obfuscation bear a striking 
55 resemblance to each other. Not only do both try to hide 
information from prying eyes, they also purport to do so for 
a limited time only. An encrypted document has a limited 
she If -life: it is safe only for as long as the encryption 
algorithm itself withstands attack, and for as long as 
60 advances in hardware speed do not allow messages for the 
chosen key-length to be routinely decrypted. The same is 
true for an obfuscated application; it remains secret only for 
as long as sufficiently powerful deobfuscators have yet to be 
built. 

65 For evolving applications this will not be a problem, as 
long as the time between releases is shorter than the time it 
takes for the deobfuscator to catch up with the obfuscator. If 
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this is the case, then by the time an application can be 
automatically deobfuscated it is already outdated and of no 
interest to a competitor. 

However, if an application contains trade secrets that can 
be assumed to survive several releases, then these should be 
protected by means other than obfuscation. Partial server- 
side execution (FIG. 2(b)) seems the obvious choice, but has 
the drawback that the application will execute slowly or 
(when the network connection is down) not at all. 

11.2 Other Uses of Obfuscation 

It is interesting to note that there may be potential 
applications of obfuscation other than as discussed above. 
One possibility is to use obfuscation in order to trace 
software pirates. For example, a vendor creates a new 
obfuscated version of his application for every new cus- 
tomer (We can generate different obfuscated versions of the 
same application by introducing an element of randomness 
into the Select Transform algorithm (Algorithm 3). Different 
seeds to the random number generator will produce different 
versions.) and keeps a record of to whom each version was 
sold. This is probably only reasonable if the application is 
being sold and distributed over the net. If the vendor finds 
out that his application is being pirated, all he needs to do is 
to get a copy of the pirated version, compare it against the 
data base, and see who bought the original application. It is, 
in fact, not necessary to store a copy of every obfuscated 
version sold. It suffices to keep the random number seed that 
was sold. 

Software pirates could themselves make (illicit) use of 
obfuscation. Because the Java obfuscator we outlined above 
works at the bytecode level, there is nothing stopping a 
pirate from obfuscating a legally bought Java application. 
The obfuscated version could then be resold. When faced 
with litigation the pirate could argue that he is, in fact, not 
reselling the application that he originally bought (after all, 
the code is completely different!), but rather a legally 
reengineered version. 

Conclusion 

In conclusion, the present invention provides a computer 
implemented method and apparatus for preventing, or at 
least hampering, reverse engineering of software. While this 
may be effected at the expense of execution time or program 
size with the resulting transformed program behaving dif- 
ferently at a detailed level, it is believed that the present 
technique provides significant utility in appropriate circum- 
stances. In one embodiment, the transformed program has 
the same observable behavior as the untransformed pro- 
gram. Accordingly, the present invention allows for such 
weak equivalence between the original and obfuscated pro- 
gram. 

While the present discussion has been primarily in the 
context of hampering reverse engineering of software, other 
applications are contemplated such as watermarking soft- 
ware objects (including applications). This exploits the 
potentially distinctive nature of any single obfuscation pro- 
cedure. A vendor would create a different obfuscated version 
of an application for every customer sold. If pirate copies are 
found, the vendor need only compare it against the original 
obfuscation information database to be able to trace the 
original application. 

The particular obfuscation transformations described 
herein are not exhaustive. Further obfuscation regimes may 
be identified and used in the present novel obfuscation tool 
architecture. 
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Where in the foregoing description reference has been 
made to elements or integers having known equivalents, 
then such equivalents are included as if they were individu- 
ally set forth. 

5 Although the present invention has been described by way 
of example and with reference to particular embodiments. It 
is to be understood that modifications and improvements can 
be made without departing from the scope of the present 
invention. 

10 What is claimed is: 

1. A computer implemented method for obfuscating code, 
comprising: 

identifying one or more source code input files corre- 
sponding to source code for the code of an application 
15 to be processed; 

selecting a required level of obfuscation (the potency); 

selecting a maximum execution time or space penalty (the 
cost); 

20 reading and parsing the input files; 

providing information identifying data types, data 
structures, and control structures used by the applica- 
tion to be processed; 
selecting and applying obfuscating transformations to 
25 source code objects until the required potency has been 
achieved or the maximum cost has been exceeded; and 
outputting the transformed code of the application, 
wherein the transformed code provides weak equiva- 
lence to the untransformed code. 
30 2. The method of claim 1, wherein at least one transfor- 
mation comprises an opaque construct, the opaque construct 
being constructed using aliasing and concurrency tech- 
niques. 

3. The method of claim 1, further comprising: 

35 outputting information about obfuscating transformations 
applied to the obfuscated code and information relating 
obfuscated code of a transformed application to source 
code of the application. 

4. The method of claim 1, wherein at least one transfor- 
40 mation is selected to preserve the observable behavior of the 

code of an application. 

5. The method of claim 1, further comprising: 
deobfuscating the code, the deobfuscating the code com- 
prising removing any obfuscations from the obfuscated 

45 code of an application by use of slicing, partial 
evaluation, dataflow analysis, or statistical analysis. 

6. A computer program embodied on a computer-readable 
medium for obfuscating code, comprising: 

logic that identifies one or more source code input files 
50 corresponding to source code for the code of an appli- 
cation to be processed; 

logic that selects a required level of obfuscation (the 
potency); 

5S logic that selects a maximum execution time or space 

penalty (the cost); 
logic that reads and parses the input files; 
logic that provides information identifying data types, 

data structures, and control structures used by the 
60 application to be processed; 

logic that selects and applies obfuscating transformations 

to source code objects until the required potency has 

been achieved or the maximum cost has been exceeded; 

and 

65 logic that outputs the transformed code of the application, 
wherein the transformed code provides weak equiva- 
lence to the untransformed code. 
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7. The computer program of claim 6, wherein at least one 
transformation comprises an opaque construct, the opaque 
construct being constructed using aliasing and concurrency 
techniques. 

8. The computer program of claim 6, further comprising: 
logic that outputs information about obfuscating transfor- 
mations applied to the obfuscated code and information 
relating obfuscated code of a transformed application to 
source code of the application. 

9. The computer program of claim 6, wherein at least one 
transformation is selected to preserve the observable behav- 
ior of the code of an application. 

10. The computer program of claim 6, further comprising: 
logic that deobfuscates the code, the deobfuscating the 

code comprising removing any obfuscations from the 
obfuscated code of an application by use of slicing, 
partial evaluation, dataflow analysis, or statistical 
analysis. 

11. An apparatus for obfuscating code, comprising: 
means for identifying one or more source code input files 

corresponding to source code for the code of an appli- 
cation to be processed; 
means for selecting a required level of obfuscation (the 
potency); 

means for selecting a maximum execution time or space 
penalty (the cost); 

means for reading and parsing the input files; 

means for providing information identifying data types, 
data structures, and control structures used by the 
application to be processed; 

means for selecting and applying obfuscating transforma- 
tions to source code objects until the required potency 
has been achieved or the maximum cost has been 
exceeded; and 

means for outputting the transformed code of the 
application, wherein the transformed code provides 
weak equivalence to the untransfonned code. 

12. The apparatus of claim 11, wherein the transformation 
comprises an opaque construct, the opaque construct being 
constructed using aliasing and concurrency techniques. 

13. The apparatus of claim 11, further comprising: 
means for outputting information about obfuscating trans- 
formations applied to the obfuscated code and infor- 
mation relating obfuscated code of a transformed appli- 
cation to source code of the application. 

14. The apparatus of claim 11, wherein at least one 
transformation is selected to preserve the observable behav- 
ior of the code of an application, 

15. The apparatus of claim 11, further comprising: 
means for deobfuscating the code, the deobfuscating the 

code comprising removing any obfuscations from the 
obfuscated code of an application by use of slicing, 
partial evaluation, dataflow analysis, or statistical 
analysis. 

16. The apparatus of claim 11, wherein the code com- 
prises Java™ bytecode. 

17. The apparatus of claim 11, wherein at least one 
transformation provides a data obfuscation, a control 
obfuscation, or a preventive obfuscation. 

18. A computer-implemented method for obfuscating 
computer code, the method including: 

loading the computer code that is to be obfuscated into a 
memory unit; 

selecting one or more obfuscation transformations to 
apply to the computer code, wherein at least one 
obfuscation transformation is one of: 
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a transformation that includes converting at least one 
reducible control flow graph to a non-reducible con- 
trol flow graph; 

a transformation that includes splitting at least one 
loop; 

a transformation that includes identifying a programming 
idiom that is used in the computer code, and replacing 
at least one programming construct that exemplifies the 
programming idiom with an equivalent programming 
construct that does not exemplify the programming 
idiom; 

a transformation that includes promoting at least one 

variable to a more general type; 
a transformation that includes merging two variables 

into a single variable; 
a transformation that includes splitting a variable into at 

least two variables; and 
a transformation that includes replacing at least one 

string with a call to a procedure that produces the 

string; and 

generating obfuscated computer code by applying the one 
or more obfuscation transformations to the computer 
code, wherein the obfuscated computer code is more 
resistant to reverse engineering, decompilation, or 
attack than the computer code. 

19. A method as in claim 18, further including: 
evaluating the obfuscated computer code to obtain a 

metric indicative of a level of obfuscation associated 
with the obfuscated computer code; and 
applying additional obfuscation transformations to the 
obfuscated computer code if the metric is less than a 
predefined level 

20. A method as in claim 18, further including: 
performing a preprocessing pass on the computer code, 

wherein the preprocessing pass serves to gather infor- 
mation about the computer code, and wherein perform- 
ing the preprocessing pass includes performing at least 
one of (a) data flow analysis, and (b) data dependence 
analysis on the computer code; and 
using the information gathered in the preprocessing pass 
in selecting the one or more obfuscation transforma- 
tions to apply to the computer code. 

21. A method as in claim 18, in which the computer code 
that is to be obfuscated is characterized by an absence of 
annotations p re-inserted for the purpose of facilitating a 
subsequent application of obfuscation transformations to the 
computer code. 

22. A computer-implemented method for obfuscating 
computer code, the method including: 

loading the computer code that is to be obfuscated into a 

memory module; 
performing a preprocessing pass on the computer code, 

the preprocessing pass serving to gather information 

about the computer code for use in selecting and 

applying one or more obfuscation transformations to 

the computer code; 
selecting one or more obfuscation transformations to 

apply to the computer code; 
generating obfuscated computer code by applying the one 

or more obfuscation transformations to the computer 

code; 

evaluating the obfuscated computer code to obtain an 
obfuscation level associated with the computer code; 
and 

applying additional obfuscation transformations to the 
obfuscated computer code if the obfuscation level is 
less than a predefined amount; 
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wherein the obfuscated computer code is rendered more 
resistant to reverse engineering, decompilation, or 
attack than the computer code. 

23. A method as in claim 22, in which performing a 
preprocessing pass on the computer code includes: 

constructing one or more control flow graphs for one or 
more routines contained in the computer code. 

24. A method as in claim 22, in which performing a 
preprocessing pass on the computer code includes construct- 
ing an inheritance graph for a plurality of classes contained 
in the computer code. 

25. A method as in claim 22, in which selecting one or 
more obfuscation transformations includes: 

obtaining an obfuscation metric for each of a plurality of 

obfuscation transformations; 
obtaining a cost metric for each of the plurality of 

obfuscation transformations; and 
choosing one or more obfuscation transformations for 

which the obfuscation metric is maximized and the cost 

metric is minimized. 

26. A method as in claim 25, in which the obfuscation 
metric for a given obfuscation transformation is based, at 
least in part, on a measure of potency and a measure of 
resilience of the given obfuscation transformation. 

27. A method as in claim 25, in which the cost metric of 
a given obfuscation transformation is based, at least in part, 
on an execution time penalty and a space penalty associated 
with the given obfuscation transformation. 

28. A method as in claim 22, in which selecting one or 
more obfuscation transformations to apply to the computer 
code includes: 

evaluating an appropriateness metric for one or more 
obfuscation transformations; and 

selecting one or more obfuscation transformations for 
which the appropriateness metric is higher than the 
appropriateness metric for one or more other obfusca- 
tion transformations. 

29. A method as in claim 28, in which evaluating an 
appropriateness metric for a given obfuscation transforma- 
tion includes: 

comparing one or more programming constructs used by 
the given obfuscation transformation to one or more 
programming constructs used by at least a portion of 
the computer code; and 

assigning a value to the appropriateness metric based on 
a degree of similarity between the programming con- 
structs used by the given obfuscation transformation 
and the programming constructs used by the portion of 
the computer code. 

30. A method as in claim 22, in which the computer code 
includes one or more object code files and one or more 
library code files referenced by the one or more object code 
files. 

31. A method as in claim 22, further including: 
receiving obfuscation control information as input. 

32. A method as in claim 31, in which the obfuscation 
control information includes one or more parameters relat- 
ing to an acceptable obfuscation cost and/or a desired level 
of obfuscation. 

33. A method as in claim 31, in which the obfuscation 
control information includes one or more parameters relat- 
ing to a maximum acceptable execution time penalty and a 
maximum acceptable space penalty associated with the 
computer code after obfuscation. 

34. A method as in claim 31, in which the obfuscation 
control information includes one or more parameters indica- 
tive of a desired level of obfuscation potency and/or resil- 
ience. 
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35. A method as in claim 22, further including: 
receiving as input an obfuscation priority, the obfuscation 

priority being associated with at least a portion of the 
computer code, wherein the obfuscation priority com- 
prises a metric of the importance of obfuscating the 
portion of the computer code with which it is associ- 
ated, 

36. A method as in claim 22, in which the computer code 
includes a plurality of routines, the method further includ- 
ing: 

assigning an execution time rank to one or more routines; 
and 

associating an obfuscation priority with each of the one or 
more routines, wherein the obfuscation priority asso- 
ciated with a given routine is inversely proportional to 
the execution time rank of the given routine. 

37. A method as in claim 22, further including: 
receiving profiling data as input, the profiling data pro- 
viding some assistance in identifying relatively 
firequently-executed portions of the computer code; and 

using the profiling data to control, at least in part, appli- 
cation of obfuscation transformations to the computer 
code. 

38. A method as in claim 22, further including: 
generating a file of annotated computer code, the file of 

annotated computer code providing an indication of 
how the one or more obfuscation transformations were 
applied to the computer code. 

39. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes converting at least one reducible control flow graph 
to an irreducible control flow graph. 

40. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes removing at least one programming idiom from the 
computer code. 

41. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes splitting an array into at least two arrays. 

42. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes merging two arrays into a single array. 

43. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes restructuring an array so that it has a different 
number of dimensions. 

44. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes adding at least one opaque programming construct 
to the computer code. 

45. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes interleaving programming statements from at least 
two different subroutines. 

46. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes: 

selecting a subroutine in the computer code; 

adding an obfuscated version of the subroutine to the 

computer code; and 
replacing a call to the subroutine with a call to the 

obfuscated version of the subroutine; 
wherein the computer code, after application of the one or 

more obfuscation transformations, includes: 

the subroutine; 
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at least one call to the subroutine; 
the obfuscated version of the subroutine; and 
at least one call to the obfuscated version of the 
subroutine. 

47. A method as in claim 46, in which the subroutine 5 
comprises a method, procedure, function, or routine. 

48. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes unrolling at least one loop contained in the com- 
puter code. 3Q 

49. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes splitting at least one loop contained in the computer 
code. 

50. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 15 
includes promoting at least one variable to a more general 
type. 

51. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes merging two variables into a single variable. 20 

52. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes splitting a variable into at least two variables. 

53. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 25 
includes inserting a bogus class. 

54. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes replacing at least one string with a call to a 
procedure that produces the string. 30 

55. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes scrambling at least one identifier. 

56. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 35 
includes inserting dead or irrelevant code. 

57. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes inlining at least one method, procedure, or function 
call. 40 

58. A method as in claim 22, in which the one or more 
obfuscation transformations include a transformation that 
includes outlining computer code into at least one method, 
procedure, or function. 

59. A computer program product for obfuscating a com- 45 
puter program or a computer program module, the computer 
program product including: 

computer code for gathering information about the com- 
puter program or module by performing a preprocess- 
ing pass on the computer program or module; 50 

computer code for performing a plurality of obfuscation 
transformations; 

computer code for selecting an obfuscation transforma- 
tion to apply to the computer program or module, 55 
wherein the selecting is based, at least in part, on the 
information gathered about the computer program or 
module during the preprocessing pass; 

computer code for applying one or more obfuscation 
transformations to the computer program or module; 60 

computer code for calculating a metric indicative of the 
degree to which the computer program or module is 
obfuscated; 

computer code for comparing the metric to a threshold; 
computer code for applying additional obfuscation trans- 65 

formations to the computer program or module if the 

metric is less than the threshold; and 
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a computer readable medium that stores the computer 
codes. 

60. A computer program product as in claim 59, further 
including: 

computer code for receiving user-input regarding a 

desired level of obfuscation; and 
computer code for using, at least in part, the desired level 

of obfuscation to set the threshold. 

61. A computer program product as in claim 59, in which 
the computer code for gathering information about the 
computer program or module includes: 

computer code for performing data dependency analysis 
on the computer program or module. 

62. A computer program product as in claim 59, in which 
the computer code for performing a plurality of obfuscation 
transformations includes computer code for implementing at 
least one opaque construct. 

63. A computer program product as in claim 62, in which 
the at least one opaque construct is generated using at least 
one of (a) aliasing, or (b) concurrency techniques. 

64. A computer-implemented method for obfuscating 
computer code, the method including: 

loading the computer code that is to be obfuscated into a 
memory module; 

performing a preprocessing pass on the computer code, 
wherein the preprocessing pass serves to gather infor- 
mation about the computer code, and wherein perform- 
ing the preprocessing pass includes performing at least 
one of (a) data flow analysis, or (b) data dependence 
analysis on the computer code; 

selecting one or more obfuscation transformations to 
apply to the computer code, the one or more obfusca- 
tion transformations being selected, at least in part, 
using the information gathered in the preprocessing 
pass; and 

generating obfuscated computer code by applying the one 
or more obfuscation transformations to the computer 
code. 

65. A method as in claim 64, further including: 
constructing one or more control flow graphs for one or 

more routines contained in the computer code that is to 
be obfuscated; and 
constructing an inheritance graph for a plurality of classes 
included in the computer code. 

66. A method as in claim 64, in which selecting one or 
more obfuscation transformations to apply to the computer 
code includes: 

calculating an appropriateness metric for one or more 
obfuscation transformations, wherein calculating the 
appropriateness metric for a given obfuscation trans- 
formation includes: 

comparing one or more programming constructs used 
by the given obfuscation transformation to one or 
more programming constructs used by at least a 
portion of the computer code; and 

assigning a value to the appropriateness metric based 
on a degree of similarity between the one or more 
programming constructs used by the given obfusca- 
tion transformation and the one or more program- 
ming constructs used by the portion of the computer 
code; 

selecting one or more obfuscation transformations for 
which the appropriateness metric is higher than the 
appropriateness metric for one or more other obfusca- 
tion transformations. 
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67. A method as in claim 64, in which the computer code 
that is to be obfuscated is characterized by an absence of 
annotations pre-inserted for the purpose of facilitating a 
subsequent application of obfuscation transformations to the 
computer code. 

68. A computer-implemented method for obfuscating 
computer code, the method including: 

loading the computer code that is to be obfuscated into a 

memory module; 
selecting one or more obfuscation transformations to 

apply to the computer code; 
generating obfuscated computer code by applying the one 

or more obfuscation transformations to the computer 

code; 

evaluating the obfuscated computer code to obtain an 
obfuscation level associated therewith; and 

applying additional obfuscation transformations to the 
obfuscated computer code if the obfuscation level is 
less than a predefined level, wherein the obfuscated 
computer code is more resistant to reverse engineering, 
decompilation, or attack than the computer code. 

69. A method as in claim 68, further including: 
performing a preprocessing pass on the computer code 

that is to be obfuscated, wherein the preprocessing pass 
serves to gather information about the computer code; 
and 

using the information gathered in the preprocessing pass 
in selecting the one or more obfuscation transforma- 
tions to apply to the computer code. 

70. A method of obfuscating a computer program or 
computer program module, the computer program or mod- 
ule containing a first variable of a first type, the method 
including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
generating a second variable and a third variable; 
replacing a reference to the first variable with a pro- 
gramming construct, wherein the programming con- 
struct is designed to use the second variable and the 
third variable to replicate the reference to the first 
variable. 

71. A method as in claim 70, in which the second variable 
is of a second type, the second type being different from the 
first type. 

72. A method as in claim 71, in which the third variable 
is of a third type, the third type being different from the first 
type and the second type. 

73. A method as in claim 71, in which the first type is 
boolean, and the second type is integer. 

74. A method as in claim 73, in which the programming 
construct includes an operation on the second variable and 
the third variable, with a first result of the operation corre- 
sponding to a true condition of the first variable, and a 
second result of the operation corresponding to a false 
condition of the first variable. 

75. A method as in claim 70, in which the programming 
construct further includes: 

a first operation which sets a value for the second variable; 
and 

a second operation which sets a value for the third 
variable. 

76. A method as in claim 70, in which the programming 
construct further includes a look-up table, and in which 
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values for the second variable and the third variable can be 
used to determine a boolean value from the look-up table. 

77. A method as in claim 76, in which the look-up tabic 
is constructed at run-time by an algorithm incorporated into 

5 the programming construct. 

78. A method of obfuscating a computer program or 
module, the computer program or module including a first 
procedure containing a first local variable of a first type and 
a second procedure containing a second local variable of a 

]0 second type, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 

15 decompilation, the alteration including: 
creating a global variable; 

replacing at least one reference to the first local variable 

with a reference to the global variable; and 
replacing at least one reference to the second local 
20 variable with a reference to the global variable. 

79. A method as in claim 78, in which the first type is the 
same as the second type. 

80. A method as in claim 78, in which the global variable 
is of a more general type than the first type and the second 

25 type- 

81. A method as in claim 79, in which the global variable 
is of the same type as the first type and the second type. 

82. A method of obfuscating a computer program or 
module, the computer program or module including at least 

30 one reference to a first instance of an indexed data type, the 
first instance of the indexed data type including at least two 
elements, the method including: 
performing an alteration on at least a portion of the 
computer program or module, the alteration being 
35 designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
generating a second instance of the indexed data type 
and a third instance of the indexed data type; 
40 storing at least a first element from the first instance of 
the indexed data type in the second instance of the 
indexed data type; 
storing at least a second element from the first instance 
of the indexed data type in the third instance of the 
45 indexed data type, the second element being different 

from the first element; 
replacing a first reference to the first instance of the 
indexed data type with a reference to the second 
instance of the indexed data type; and 
50 replacing a second reference to the first instance of the 
indexed data type with a reference to the third 
instance of the indexed data type; 
wherein the reference to the second instance of the 
indexed data type and the reference to the third 
55 instance of the indexed data type are designed to 

retrieve the first element and the second element, 
respectively. 

83. A method of obfuscating a computer program or 
module, the computer program or module including at least 

60 one reference to a first instance of an indexed data type and 
at least one reference to a second instance of the indexed 
data type, the method including: 
performing an alteration on at least a portion of the 
computer program or module, the alteration being 
65 designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
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generating a third instance of the indexed data type; 

storing data from the first instance of the indexed data 
type and data from the second instance of the 
indexed data type in the third instance of the indexed 
data type; 

replacing a reference to the first instance of the indexed 

data type with a reference to the third instance of the 

indexed data type; and 
replacing a reference to the second instance of the 

indexed data type with a reference to the third 

instance of the indexed data type. 

84. A method as in claim 83, in which the indexed data 
type comprises an array. 

85. A method as in claim 83, in which the reference to the 
first instance of the indexed data type is designed to refer to 
a particular location within the first instance of the indexed 
data type, and in which replacing the reference to the first 
instance of the indexed data type further includes: 

inserting into the computer program or module a first 
programming construct used to refer to a location 
within the third instance of the indexed data type 
corresponding to the particular location within the 
first-instance of the indexed data type. 

86. A method as in claim 85, in which the reference to the 
second instance of the indexed data type is designed to refer 
to a particular location within the second instance of the 
indexed data type, and in which replacing the reference to 
the second instance of the indexed data type further 
includes: 

inserting into the computer program or module a second 30 
programming construct used to refer to a location 
within the third instance of the indexed data type 
corresponding to the particular location within the 
second instance of the indexed data type. 

87. A method as in claim 86, in which: 35 
the first programming construct uses at least a first vari- 
able; and 

the second programming construct uses at least a second 
variable. 4Q 

88. A method as in claim 87, in which: 

the first variable and the second variable are different, and 
in which the first programming construct and the sec- 
ond programming construct are otherwise the same. 

89. A method as in claim 83, in which the first instance of 45 
the indexed data type comprises a string of characters and 
the second instance of the indexed data type comprises a 
string of characters. 

90. A method of obfuscating a computer program or 
module, the computer program or module including a first 
instance of an indexed data type, the first instance having n 
dimensions, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
generating a second instance of the indexed data type, 

the second instance having m dimensions, m being 

different from n; 
storing data from the first instance of the indexed data 

type into the second instance of the indexed data 

type; and 

replacing a reference to the first instance of the indexed 
data type with a reference to the second instance of 65 
the indexed data type, the reference to the second 
instance of the indexed data type being designed to 



50 



55 



60 



retrieve a data element from the second instance of 
the indexed data type that corresponds to a data 
element in the first instance of the indexed data type 
to which the reference to the first instance of the 
indexed data type refers. 

91. A method as in claim 90, in which n equals one. 

92. A method of obfuscating a computer program or 
module, the computer program or module being designed to 
carry out one or more specified tasks and including a first 
thread, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration designed to 
render the computer program or module at least some- 
what more resistant to reverse engineering or 
decompilation, the alteration including: 
generating a second thread; 

inserting one or more programming statements into the 
computer program or module, the programming 
statements carrying out no function which contrib- 
utes to the one or more specified tasks, wherein at 
least one or more of the programming statements are 
designed to run in the second thread. 

93. A method as in claim 92, in which the alteration does 
not materially affect completion of the one or more specified 
tasks. 

94. A method as in claim 92, in which at least one or more 
of the programming statements are designed to run in the 
first thread. 

95. A method as in claim 92, in which the computer 
program or module and the programming statements are 
written in the Java programming language. 

96. A method as in claim 92, in which the first thread and 
the second thread are synchronized through the use of 
synchronization primitives. 

97. A method of obfuscating a computer program or 
module, the computer program or module being designed to 
carry out one or more specified tasks, the computer program 
or module containing at least one variable used in at least 
one process, the variable being set to an initial value, the 
method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation without materially affecting accom- 
plishment of the one or more specified tasks, the 
alteration including: 

altering the initial value of the variable to yield an 

altered initial value; and 
altering the process to take into account the altered 

initial value. 

98. A method as in claim 97, in which the variable is an 
integer variable, and in which the variable is used for 
iteration through the process, the method further including: 

incrementing or decrementing the variable for each itera- 
tion through the process; and 

continuing iteration until the variable reaches a predefined 
ending value; 

wherein altering the process to take into account the altered 
initial value includes altering the predefined ending value. 

99. A method of obfuscating a computer program or 
module, the computer program or module containing a 
variable of a first type, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
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least somewhat more resistant to reverse engineering or 

decompilation, the alteration including: 

creating a variable of a second type, the second type 
being more general than the first type; and 

replacing at least one reference in the computer pro- 
gram or module to the variable of the first type with 
a reference to the variable of the second type. 

100. A method as in claim 99, in which the variable of a 
first type is an integer variable and the variable of the second 
type is an integer object. 

101. A method of obfuscating a computer program or 
module, the computer program or module including a first 
character string, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
generating a programming construct designed to 

dynamically produce the first character string; and 
replacing at least one instance of the first character 

string with the programming construct or a call to the 

programming construct. 

102. A method as in claim 101, in which: 

the computer program or module further includes a sec- 
ond character string; 

and in which the programming construct accepts a vari- 
able value as an input, and produces the first character 
string or the second character string depending, at least 
in part, on the variable value. 

103. A method as in claim 102, in which the programming 
construct comprises a computational function, computa- 
tional method, procedure, subroutine, or routine. 

104. A method of obfuscating a computer program or 
module, the computer program or module including a first 
variable and a second variable, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
specifying a third variable; 

replacing a reference to the first variable with a pro- 
gramming construct, the programming construct 
including at least one programming statement 
designed to use a value of the third variable to 
determine a value of the first variable; 

replacing a reference to the second variable with the 
programming construct, the programming construct 
including at least one programming statement 
designed to use the value of the third variable to 
determine a value of the second variable. 

105. A method as in claim 104, in which the first variable 
and the second variable comprise scalar variables. 

106. A method as in claim 104, in which the value of the 
first variable is stored in a first portion of the third variable 
and the value of the second variable is stored in a second 
portion of the third variable. 

107. A method as in claim 104, in which the alteration 
further includes: 

incorporating into the computer program or module an 
operation performed on the third variable, the operation 
performing no function necessary for correct execution 
of the computer program or module. 

108. A method as in claim 107, in which the operation is 
incorporated into the computer program or module in such 



58,325 Bl 

50 

a manner that the operation is not performed during normal 
execution of the computer program or module. 

109. A method as in claim 107, in which execution of the 
operation is conditioned on a second operation, the second 

5 operation including evaluation of an opaque predicate, the 
opaque predicate being designed to evaluate in such a 
manner that the operation is not executed during normal 
execution of the computer program or module. 

110. A method as in claim 109, in which the operation 
3Q includes a rotate operation performed on the third variable. 

111. A method of obfuscating a computer program or 
module, the computer program or module including a first 
variable, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
35 designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
specifying an instance of an indexed data type, the 
instance containing at least two elements; and 
20 replacing at least one reference to the first variable with 
a first programming construct, the first programming 
construct including one or more programming state- 
ments which use, at least in part, one or more values 
stored in at least one of the elements of the instance 
25 of the indexed data type. 

112. A method as in claim 111, in which the indexed data 
type comprises an array. 

113. A method as in claim 111, in which the instance of the 
indexed data type comprises a string of characters. 

30 114. A method as in claim 111, in which: 

the computer program or module includes a second vari- 
able; and 
the alteration further includes: 

replacing at least one reference to the second variable 
35 with a second programming construct, the second 

programming construct including one or more pro- 
gramming statements which use, at least in part, one 
or more values stored in at least one of the elements 
of the instance of the indexed data type. 
40 115. A method as in claim 114, in which the first pro- 
gramming construct and the second programming construct 
include retrieval of an element from the instance of the 
indexed data type. 

116. A method of obfuscating a computer program or 
45 module, the computer program or module including a first 

variable value used in a first indexing operation on an 
instance of an indexed data type, the method including: 
performing an alteration on at least a portion of the 
computer program or module, the alteration being 
50 designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
creating a first opaque encoding function, the first 
opaque encoding function using the first variable 
55 value to compute a second variable value for use in 

indexing the instance of the indexed data type; and 
replacing the first indexing operation with a second 
indexing operation, the second indexing operation 
using the second variable value to index the instance 
60 of the indexed data type. 

117. A method as in claim 116, in which the indexed data 
type comprises an array 

118. A method of obfuscating a computer program or 
module, the computer program or module including a loop, 

65 the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
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designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including reversing the 
direction of the loop. 

119. A method as in claim 118, in which reversing the 
direction of the loop includes adding a bogus data depen- 
dency to the loop. 

120. A method of obfuscating a computer program or 
module, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
incorporating an opaque computational function into 
the computer program or module, wherein evalua- 
tion of the opaque computational function depends, 
at least in part, on a value of a predefined variable; 
including a first parallel process and a second parallel 
process in the computer program or module, the first 
parallel process and the second parallel process 
determining, at least in part, the value of the pre- 
defined variable; and 
modifying the computer program or module so that at 
least one operation depends upon the opaque com- 
putational function evaluating to a predefined value. 

121. A method as in claim 120, in which: 

the first parallel process includes a first thread, and the 
second parallel process includes a second thread. 

122. A method as in claim 121, in which the first and 
second threads execute concurrently. 

123. A method as in claim 122, in which the first thread 
includes one or more programming statements that cause 
execution of the first thread to pause and to later resume. 

124. A method as in claim 120, in which the opaque 
computational function comprises an opaque predicate. 

125. A method of obfuscating a computer program or 
module, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
incorporating an opaque computational function into 
the computer program or module, including: 
generating a first data structure containing first data 
structure elements, each first data structure ele- 
ment including a first field capable of pointing to 
at least one other first data structure element; 
generating a first pointer pointing to a particular first 

data structure element; and 
calculating a value of the opaque computational 
function by using, at least in part, a value of the 
first pointer; 

altering the computer program or module so that at 
least one operation depend s upon the opaque com- 
putational f unction evaluating to a predefined value. 

126. A method as in claim 125, in which: 

each first data structure element further includes an addi- 
tional field; and 
in which the alteration further includes: 
adding to the computer program or module a first 
operation that is dependent on a value of the addi- 
tional field of the particular first data structure ele- 
ment pointed to by the first pointer. 

127. A method as in claim 126, in which the additional 
field comprises a boolean field. 
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128. A method as in claim 126, in which incorporating an 
opaque computational function into the computer program 
or module further includes: 
adding a programming construct to the computer program 
5 or module which alters the value of the first pointer, the 
programming construct causing the first pointer to 
point to another first data structure element reachable 
from the particular first data structure element previ- 
ously pointed to by the first pointer. 
io 129. A method as in claim 128, in which incorporating an 
opaque computational function into the computer program 
or module further includes adding at least one programming 
statement which adds a new first data structure element to 
the first data structure. 
15 130. A method as in claim 129, in which incorporating an 
opaque computational function into the computer program 
or module further includes: 
generating a second data structure containing second data 
structure elements, each second data structure element 
20 including a first field capable of pointing to at least one 
other second data structure element; 
generating a second pointer pointing to a particular sec- 
ond data structure data element; and 
^ calculating a value of the opaque computational function 
by using, at least in part, a value of the second pointer. 
131. A method as in claim 130, in which generating a 
second data structure is performed at runtime, and in which 
generating a second pointer is performed at runtime. 
30 132. A method as in claim 130, in which: 

none of the first data structure elements point to a second 

data structure element; and 
none of the second data structure elements point to a first 
data structure element. 
35 133. A method as in claim 132, in which the opaque 
computational function is calculated, at least in part, by an 
operation which results in a first state if the value of the first 
pointer is equal to the value of the second pointer, and a 
second state if the value of the first pointer is not equal to the 
40 value of the second pointer; whereby the operation always 
results in the second state. 

134. A method as in claim 133, in which incorporating an 
opaque computational function into the computer program 
or module further includes: 

45 adding a second programming construct which alters the 
value of the second pointer, the second programming 
construct causing the second pointer to point to another 
second data structure element reachable from the par- 
ticular second data structure element previously 

so pointed to by the second pointer. 

135. A method as in claim 132, in which the opaque 
computational function is calculated, at least in part, by an 
operation which results in a first state if a value associated 
with the first data structure element pointed to by the first 

55 pointer is equal to a value associated with the second data 
structure element pointed to by the second pointer, and 
results in a second state if the value associated with the first 
data structure element pointed to by the first pointer is not 
equal to the value associated with the second data structure 

60 element pointed to by the second pointer. 

136. A method as in claim 135, in which the value 
associated with the first data structure element and the value 
associated with the second data structure element include 
pointers to other data elements stored in the first and second 

65 data structures. 

137. A method as in claim 130, in which incorporating an 
opaque computational function into the computer program 
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or module further includes generating a third pointer, the 
third pointer pointing to one of the first data structure 
elements; whereby the opaque computational function is 
calculated, at least in part, by an operation which results in 
a first state if the value of the first pointer is equal to a value 5 
of the third pointer, and results in a second state if the value 
of the first pointer is not equal to the value of the third 
pointer, whereby the operation may result in the first state or 
the second state. 

138. A method of obfuscating a computer program or 10 
module, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 15 
decompilation, the alteration including: 
incorporating a first opaque computational function 

into the computer program or module; and 
including a first programming construct in the com- 
puter program or module, the first programming 20 
construct operable to execute a first group of one or 
more programming statements if the opaque com- 
putational function computes to a first value, and to 
execute a second group of one or more programming 
statements if the opaque computational function 25 
computes to a second value. 

139. A method as in claim 138, in which the first group 
and the second group perform similar functions using pro- 
gramming statements which differ in at least one respect. 

140. A method as in claim 139, in which the one or more 30 
programming statements contained in the first group and the 
one or more programming statements contained in the 
second group are chosen to obscure, at least in part, simi- 
larities between the functions performed by the first group 
and by the second group. 35 

141. A method as in claim 138, in which the opaque 
computational function is calculated, at least in part, based 
on the value of an opaque variable. 

142. A method of obfuscating a computer program or 
module, the computer program or module including pro- 40 
gramming statements written in a first language, the method 
including: 

performing an alteration on at least a portion of the 
computer program or module to form an altered com- 
puter program or module, the alteration being designed as 
to render the altered computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation than the computer program or module, 
the alteration including: 

translating at least some of the programming statements 50 
from the first language to a second language; 

incorporating into the altered computer program or 
module at least one programming construct in the 
second language which lacks a direct correspon- 
dence with any programming construct in the first 55 
language. 

143. A method as in claim 142, in which the first language 
comprises a source language, and in which the second 
language comprises an object language. 

144. A method as in claim 143, in which the source 60 
language comprises Java, and the object language comprises 
Java bytecode. 

145. A method as in claim 144, in which the programming 
construct in the second language consists, at least in part, of 

a goto instruction. 65 

146. A method as in claim 142, in which a control flow 
graph associated with the computer program or module is 
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reducible, and in which a control flow graph associated with 
the altered computer program or module is irreducible. 

147. A process for obfuscating a computer program or 
module, the computer program or module including at least 
one call to a method in a first library class, the process 
including: 

creating a second library class, the second library class 
containing a second set of one or more methods 
designed to perform one or more of the same operations 
as a first set of one or more methods contained in the 
first library class, wherein the second library class is 
designed to obscure, at least in part, similarity to the 
first library class; and 

replacing in the computer program or module at least one 

call to a method in the first library class with a call to 

a method in the second library class; 
whereby the computer program or module is rendered 

least somewhat more resistant to reverse engineering or 

decompilation. 

148. A process as in claim 147, in which the second 
library class comprises a standard library class. 

149. A process as in claim 148, in which: 

the second library class comprises a standard Java library 
class; and 

the computer program or module is written in Java. 

150. A method of obfuscating a computer program or 
module, the method including: 

reviewing the computer program or module to identify at 
least one programming idiom; and 

replacing at least one occurrence of the programming 
idiom with one or more alternative programming con- 
structs designed to render the computer program or 
module more difficult to reverse engineer or decompile. 

151. A method as in claim 150, in which the at least one 
programming idiom includes a linked list. 

152. A method as in claim 151, in which at least one of the 
alternative programming constructs comprises an array of 
elements. 

153. A method of obfuscating a computer program or 
module, the computer program or module including a call to 
a particular procedure, the particular procedure being one of 
a group of procedures, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
replacing a call to the particular procedure with the 
group of procedures itself, each procedure in the 
group of procedures being located at a different 
address; and 

incorporating a programming construct designed to 
jump to the address of the particular procedure. 

154. A method as in claim 153, in which the programming 
construct includes an opaque computational function. 

155. A method of obfuscating a computer program or 
module, the computer program or module including a first 
procedure and a second procedure, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
generating a third procedure, the third procedure 
including: 
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at least a portion of the first procedure; 
at least a portion of the second procedure; 
a parameter list for the first procedure; 
a parameter list for the second procedure; and 
a programming construct designed to allow a call to 
the third procedure to specify execution of the first 
procedure or the second procedure; 
replacing at least one call to the first procedure with a 
call to the third procedure, the call to the third 
procedure including information used by the pro- 
gramming construct to cause the third procedure to 
execute at least a portion of the first procedure, 

156. A method as in claim 155, in which: 

the computer program or module is written in Java, and 
the first and second procedures comprise Java methods. 

157. A method as in claim 155, in which the programming 
construct includes an opaque variable. 

158. A method as in claim 155, in which the programming 
construct includes an opaque predicate. 

159. A method of obfuscating a computer program or 
module, the computer program or module including a loop, 
the loop including a body containing at least a first pro- 
gramming statement and a second programming statement, 
the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
unrolling tho loop to form an unrolled loop, the unroll- 
ing including replicating the body of the loop one or 
more times; 

splitting the unrolled loop into at least a first program- 
ming sequence and a second programming sequence, 
the first programming sequence including the first 

programming statement; and 
the second programming sequence including the 
second programming statement; 
whereby the first programming statement and the sec- 
ond programming statement are performed an 
equivalent number of times as in the unrolled loop. 

160. A method of obfuscating a computer program or 
computer program module, the computer program or mod- 
ule including at least a first class, the method including: 

performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
generating a second class and a third class, the third 

class inheriting directly from the second class, and 

the second class and the third class being designed to 

replace the first class; 
incorporating the second class and the third class into 

the computer program or module; and 
removing the first class from the computer program or 

module. 

161. A method of deobfuscating an obfuscated computer 
program or computer program module, the method includ- 
ing: 

loading the obfuscated computer program or module into 
a memory unit; 

identifying one or more opaque programming constructs 
included in the obfuscated computer program or 
module, wherein identifying the one or more opaque 
programming constructs includes at least one of: 
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(a) performing pattern matching on the obfuscated 
computer program or module, wherein the pattern 
matching includes comparing one or more known 
opaque programming constructs to one or more 

5 programming constructs contained in the obfuscated 

computer program or module; 

(b) performing program slicing on the obfuscated com- 
puter program or module; and/or 

(c) performing statistical analysis on the computer 
10 program or module, wherein the statistical analysis 

includes (i) analyzing runtime characteristics of one 
or more predicates contained in the computer pro- 
gram or module, and (ii) determining that at least one 
predicate evaluates to a given value at least a pre- 
defined percentage of the time the computer program 
15 or module is run; 

evaluating the one or more opaque programming con- 
structs; and 

producing a deobfuscated computer program or module 
by replacing at least one of the one or more opaque 
20 programming constructs with equivalent, non-opaque 
programming constructs; 
whereby the deobfuscated computer program or module is 
rendered easier to understand, reverse engineer, or 
decompile than the obfuscated computer program or 
25 module. 

162. A method of deobfuscating an obfuscated computer 
program or * computer program module, the method includ- 
ing: 

loading the obfuscated computer program or module into 
30 a memory unit; 

determining that an obfuscation transformation has been 
applied to the obfuscated computer program or module; 
selecting one or more deobfuscation transformations to 
apply to the obfuscated computer program or module, 
35 the one or more deobfuscation transformations being 
operable to counteract at least some effects of the 
obfuscation transformation; and 
applying the one or more deobfuscation transformations 
to the obfuscated computer program or module, 
40 whereby the obfuscated computer program or module 
is rendered more amenable to reverse engineering, 
decompilation, or attack. 

163. A method as in claim 162, further including: 

45 performing a preprocessing pass on the obfuscated com- 
puter program or module, the preprocessing pass serv- 
ing to collect information about the obfuscated com- 
puter program or module; 
using the information gathered in the preprocessing pass 

5Q in selecting the one or more deobfuscation transforma- 
tion to apply to the computer code. 

164. A method as in claim 163, in which performing the 
preprocessing pass includes performing data flow analysis 
on the obfuscated computer program or module. 

ss 165. A method as in claim 163, in which performing the 
preprocessing pass includes performing an aliasing analysis 
on the obfuscated computer program or module. 

166. A method as in claim 163, in which performing the 
preprocessing pass includes performing one or more of the 
60 following: 

building an inheritance tree; 
building a symbol table; 
constructing at least one control flow graph; 
and performing theorem proving. 
65 1 67. A method as in claim 162, in which determining that 
an obfuscation transformation has been applied to the obfus- 
cated computer program or module includes at least one of: 
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(a) performing pattern matching on the obfuscated com- 
puter program or module, wherein the pattern matching 
includes comparing one or more known opaque pro- 
gramming constructs to one or more programming 
constructs contained in the obfuscated computer pro- 
gram or module; 

(b) performing program slicing on the obfuscated com- 
puter program or module; and 

(c) performing statistical analysis on the computer pro- 
gram or module, wherein the statistical analysis 
includes (i) analyzing runtime characteristics of one or 
more predicates contained in the computer program or 
module, and (ii) determining that at least one predicate 
evaluates to a given value at least a predefined percent- 
age of the time the computer program or module is run. 

168. A method as in claim 162, in which the deobfuscation 
transformation includes: 

evaluating one or more opaque programming constructs 
contained in the computer program or module; and 

replacing the one or more opaque programming con- 
structs with equivalent, non-opaque programming con- 
structs. 

169. A method of obfuscating a computer program or 
computer program module, the computer program or mod- 



10 



15 



20 



ule being designed to carry out one or more specified tasks, 
the method including: 
performing an alteration on at least a portion of the 
computer program or module, the alteration being 
designed to render the computer program or module at 
least somewhat more resistant to reverse engineering or 
decompilation, the alteration including: 
inserting one or more unnecessary program statements 
into the computer program or module, the one or 
more unnecessary program statements carrying out 
no function which contributes to the one or more 
specified tasks; 
and wherein the unnecessary program statements are 
designed to render program slicing at least somewhat 
more difficult or expensive to employ. 

170. A method as in claim 169, in which the unnecessary 
program statements introduce one or more parameter 
aliases. 

171. A method as in claim 169, in which the unnecessary 
program statements introduce one or more variable depen- 
dencies. 
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